Search Results

Search found 6515 results on 261 pages for 'half life'.

Page 259/261 | < Previous Page | 255 256 257 258 259 260 261 | Next Page >

Simple syntax error still eluding me.

- by melee

Here is the header for a class I started: #ifndef CANVAS_ #define CANVAS_ #include <iostream> #include <iomanip> #include <string> #include <stack> class Canvas { public: Canvas(); void Paint(int R, int C, char Color); const int Nrow; const int Ncol; string Title; int image[][100]; stack<int> path; struct PixelCoordinates { unsigned int r; unsigned int c; } position; Canvas operator<< (const Canvas& One ); Canvas operator>>( Canvas& One ); }; /*----------------------------------------------------------------------------- Name: operator<< Purpose: Put a Canvas into an output stream -----------------------------------------------------------------------------*/ ostream& operator<<( ostream& Out, const Canvas& One ) { Out << One.Title << endl; Out << "Rows: " << One.Nrow << " Columns: " << One.Ncol << endl; int i,j; for( i=0; i<One.Nrow; i++) { cout<<"\n\n\n"; cout<< " COLUMN\n"; cout<< " 1 2 3"; for(i=0;i<One.Nrow;i++) { cout<<"\nROW "<<i+1; for(j=0;j<One.Ncol;j++) cout<< One.image[i][j]; } } return Out; } /*----------------------------------------------------------------------------- Name: operator>> Purpose: Get a Canvas from an input stream -----------------------------------------------------------------------------*/ istream& operator>>( istream& In, Canvas& One ) { // string Line; // int Place = 0; // { // In >> Line; // if (In.good()) // { // One.image[Place][0] = Line; // Place++; // } // return In; #endif Here is my implementation file for class Canvas: using namespace std; #include <iostream> #include <iomanip> #include <string> #include <stack> #include "proj05.canvas.h" //----------------Constructor----------------// Canvas::Canvas() { Title = ""; Nrow = 0; Ncol = 0; image[][100] = {}; position.r = 0; position.c = 0; } //-------------------Paint------------------// void Canvas::Paint(int R, int C, char Color) { cout << "Paint to be implemented" << endl; } And the errors I'm getting are these: proj05.canvas.cpp: In function 'std::istream& operator>>(std::istream&, Canvas&)': proj05.canvas.cpp:11: error: expected `;' before '{' token proj05.canvas.cpp:24: error: expected `}' at end of input From my limited experience, they look like simple syntax errors but for the life of me, I cannot see what I am missing. I know putting a ; at the end of Canvas::Canvas() is wrong but that seems to be what it expects. Could someone please clarify for me? (Also, I know much of the code for the << and operator definitions look terrible, but unless that is the specific reason for the error please do not address it. This is a draft :) )

Read the article
Access violation using LocalAlloc()

- by PaulH

I have a Visual Studio 2008 Windows Mobile 6 C++ application that is using an API that requires the use of LocalAlloc(). To make my life easier, I created an implementation of a standard allocator that uses LocalAlloc() internally: /// Standard library allocator implementation using LocalAlloc and LocalReAlloc /// to create a dynamically-sized array. /// Memory allocated by this allocator is never deallocated. That is up to the /// user. template< class T, int max_allocations > class LocalAllocator { public: typedef T value_type; typedef size_t size_type; typedef ptrdiff_t difference_type; typedef T* pointer; typedef const T* const_pointer; typedef T& reference; typedef const T& const_reference; pointer address( reference r ) const { return &r; }; const_pointer address( const_reference r ) const { return &r; }; LocalAllocator() throw() : c_( NULL ) { }; /// Attempt to allocate a block of storage with enough space for n elements /// of type T. n>=1 && n<=max_allocations. /// If memory cannot be allocated, a std::bad_alloc() exception is thrown. pointer allocate( size_type n, const void* /*hint*/ = 0 ) { if( NULL == c_ ) { c_ = LocalAlloc( LPTR, sizeof( T ) * n ); } else { HLOCAL c = LocalReAlloc( c_, sizeof( T ) * n, LHND ); if( NULL == c ) LocalFree( c_ ); c_ = c; } if( NULL == c_ ) throw std::bad_alloc(); return reinterpret_cast< T* >( c_ ); }; /// Normally, this would release a block of previously allocated storage. /// Since that's not what we want, this function does nothing. void deallocate( pointer /*p*/, size_type /*n*/ ) { // no deallocation is performed. that is up to the user. }; /// maximum number of elements that can be allocated size_type max_size() const throw() { return max_allocations; }; private: /// current allocation point HLOCAL c_; }; // class LocalAllocator My application is using that allocator implementation in a std::vector< #define MAX_DIRECTORY_LISTING 512 std::vector< WIN32_FIND_DATA, LocalAllocator< WIN32_FIND_DATA, MAX_DIRECTORY_LISTING > > file_list; WIN32_FIND_DATA find_data = { 0 }; HANDLE find_file = ::FindFirstFile( folder.c_str(), &find_data ); if( NULL != find_file ) { do { // access violation here on the 257th item. file_list.push_back( find_data ); } while ( ::FindNextFile( find_file, &find_data ) ); ::FindClose( find_file ); } // data submitted to the API that requires LocalAlloc()'d array of WIN32_FIND_DATA structures SubmitData( &file_list.front() ); On the 257th item added to the vector<, the application crashes with an access violation: Data Abort: Thread=8e1b0400 Proc=8031c1b0 'rapiclnt' AKY=00008001 PC=03f9e3c8(coredll.dll+0x000543c8) RA=03f9ff04(coredll.dll+0x00055f04) BVA=21ae0020 FSR=00000007 First-chance exception at 0x03f9e3c8 in rapiclnt.exe: 0xC0000005: Access violation reading location 0x01ae0020. LocalAllocator::allocate is called with an n=512 and LocalReAlloc() succeeds. The actual Access Violation exception occurs within the std::vector< code after the LocalAllocator::allocate call: 0x03f9e3c8 0x03f9ff04 > MyLib.dll!stlp_std::priv::__copy_trivial(const void* __first = 0x01ae0020, const void* __last = 0x01b03020, void* __result = 0x01b10020) Line: 224, Byte Offsets: 0x3c C++ MyLib.dll!stlp_std::vector<_WIN32_FIND_DATAW,LocalAllocator<_WIN32_FIND_DATAW,512> >::_M_insert_overflow(_WIN32_FIND_DATAW* __pos = 0x01b03020, _WIN32_FIND_DATAW& __x = {...}, stlp_std::__true_type& __formal = {...}, unsigned int __fill_len = 1, bool __atend = true) Line: 112, Byte Offsets: 0x5c C++ MyLib.dll!stlp_std::vector<_WIN32_FIND_DATAW,LocalAllocator<_WIN32_FIND_DATAW,512> >::push_back(_WIN32_FIND_DATAW& __x = {...}) Line: 388, Byte Offsets: 0xa0 C++ MyLib.dll!Foo(unsigned long int cbInput = 16, unsigned char* pInput = 0x01a45620, unsigned long int* pcbOutput = 0x1dabfbbc, unsigned char** ppOutput = 0x1dabfbc0, IRAPIStream* __formal = 0x00000000) Line: 66, Byte Offsets: 0x1e4 C++ If anybody can point out what I may be doing wrong, I would appreciate it. Thanks, PaulH

Read the article
Flash AS3 Mysterious Blinking MovieClip

- by Ben

This is the strangest problem I've faced in flash so far. I have no idea what's causing it. I can provide a .swf if someone wants to actually see it, but I'll describe it as best I can. I'm creating bullets for a tank object to shoot. The tank is a child of the document class. The way I am creating the bullet is: var bullet:Bullet = new Bullet(); (parent as MovieClip).addChild(bullet); The bullet itself simply moves itself in a direction using code like this.x += 5; The problem is the bullets will trace for their creation and destruction at the correct times, however the bullet is sometimes not visible until half way across the screen, sometimes not at all, and sometimes for the whole traversal. Oddly removing the timer I have on bullet creation seems to solve this. The timer is implemented as such: if(shot_timer == 0) { shoot(); // This contains the aforementioned bullet creation method shot_timer = 10; My enter frame handler for the tank object controls the timer and decrements it every frame if it is greater than zero. Can anyone suggest why this could be happening? EDIT: As requested, full code: Bullet.as package { import flash.display.MovieClip; import flash.events.Event; public class Bullet extends MovieClip { public var facing:int; private var speed:int; public function Bullet():void { trace("created"); speed = 10; addEventListener(Event.ADDED_TO_STAGE,addedHandler); } private function addedHandler(e:Event):void { addEventListener(Event.ENTER_FRAME,enterFrameHandler); removeEventListener(Event.ADDED_TO_STAGE,addedHandler); } private function enterFrameHandler(e:Event):void { //0 - up, 1 - left, 2 - down, 3 - right if(this.x > 720 || this.x < 0 || this.y < 0 || this.y > 480) { removeEventListener(Event.ENTER_FRAME,enterFrameHandler); trace("destroyed"); (parent as MovieClip).removeChild(this); return; } switch(facing) { case 0: this.y -= speed; break; case 1: this.x -= speed; break; case 2: this.y += speed; break; case 3: this.x += speed; break; } } } } Tank.as: package { import flash.display.MovieClip; import flash.events.KeyboardEvent; import flash.events.Event; import flash.ui.Keyboard; public class Tank extends MovieClip { private var right:Boolean = false; private var left:Boolean = false; private var up:Boolean = false; private var down:Boolean = false; private var facing:int = 0; //0 - up, 1 - left, 2 - down, 3 - right private var horAllowed:Boolean = true; private var vertAllowed:Boolean = true; private const GRID_SIZE:int = 100; private var shooting:Boolean = false; private var shot_timer:int = 0; private var speed:int = 2; public function Tank():void { addEventListener(Event.ADDED_TO_STAGE,stageAddHandler); addEventListener(Event.ENTER_FRAME, enterFrameHandler); } private function stageAddHandler(e:Event):void { stage.addEventListener(KeyboardEvent.KEY_DOWN,checkKeys); stage.addEventListener(KeyboardEvent.KEY_UP,keyUps); removeEventListener(Event.ADDED_TO_STAGE,stageAddHandler); } public function checkKeys(event:KeyboardEvent):void { if(event.keyCode == 32) { //trace("Spacebar is down"); shooting = true; } if(event.keyCode == 39) { //trace("Right key is down"); right = true; } if(event.keyCode == 38) { //trace("Up key is down"); // lol up = true; } if(event.keyCode == 37) { //trace("Left key is down"); left = true; } if(event.keyCode == 40) { //trace("Down key is down"); down = true; } } public function keyUps(event:KeyboardEvent):void { if(event.keyCode == 32) { event.keyCode = 0; shooting = false; //trace("Spacebar is not down"); } if(event.keyCode == 39) { event.keyCode = 0; right = false; //trace("Right key is not down"); } if(event.keyCode == 38) { event.keyCode = 0; up = false; //trace("Up key is not down"); } if(event.keyCode == 37) { event.keyCode = 0; left = false; //trace("Left key is not down"); } if(event.keyCode == 40) { event.keyCode = 0; down = false; //trace("Down key is not down") // O.o } } public function checkDirectionPermissions(): void { if(this.y % GRID_SIZE < 5 || GRID_SIZE - this.y % GRID_SIZE < 5) { horAllowed = true; } else { horAllowed = false; } if(this.x % GRID_SIZE < 5 || GRID_SIZE - this.x % GRID_SIZE < 5) { vertAllowed = true; } else { vertAllowed = false; } if(!horAllowed && !vertAllowed) { realign(); } } public function realign():void { if(!horAllowed) { if(this.x % GRID_SIZE < GRID_SIZE / 2) { this.x -= this.x % GRID_SIZE; } else { this.x += (GRID_SIZE - this.x % GRID_SIZE); } } if(!vertAllowed) { if(this.y % GRID_SIZE < GRID_SIZE / 2) { this.y -= this.y % GRID_SIZE; } else { this.y += (GRID_SIZE - this.y % GRID_SIZE); } } } public function enterFrameHandler(Event):void { //trace(shot_timer); if(shot_timer > 0) { shot_timer--; } movement(); firing(); } public function firing():void { if(shooting) { if(shot_timer == 0) { shoot(); shot_timer = 10; } } } public function shoot():void { var bullet = new Bullet(); bullet.facing = facing; //0 - up, 1 - left, 2 - down, 3 - right switch(facing) { case 0: bullet.x = this.x; bullet.y = this.y - this.height / 2; break; case 1: bullet.x = this.x - this.width / 2; bullet.y = this.y; break; case 2: bullet.x = this.x; bullet.y = this.y + this.height / 2; break; case 3: bullet.x = this.x + this.width / 2; bullet.y = this.y; break; } (parent as MovieClip).addChild(bullet); } public function movement():void { //0 - up, 1 - left, 2 - down, 3 - right checkDirectionPermissions(); if(horAllowed) { if(right) { orient(3); realign(); this.x += speed; } if(left) { orient(1); realign(); this.x -= speed; } } if(vertAllowed) { if(up) { orient(0); realign(); this.y -= speed; } if(down) { orient(2); realign(); this.y += speed; } } } public function orient(dest:int):void { //trace("facing: " + facing); //trace("dest: " + dest); var angle = facing - dest; this.rotation += (90 * angle); facing = dest; } } }

Read the article
Why does this code generate an error? [closed]

- by user559601

<link rel="search" type="application/opensearchdescription+xml" href="http://b.static.ak.fbcdn.net/rsrc.php/yJ/r/H2SSvhJMJA-.xml" title="" /> <link rel="shortcut icon" href="http://static.ak.fbcdn.net/rsrc.php/y7/r/5875srnzL-I.ico" /></head> <body class="WelcomePage UIPage_LoggedOut mac Locale_en_US"> <div id="FB_HiddenContainer" style="position:absolute; top:-10000px; width:0px; height:0px;" ></div> <div id="blueBar" class="loggedOut"></div> <div id="globalContainer"><div id="dialogContainer"></div><div id="dropmenu_container"></div> <div id="content" class="fb_content clearfix"><div > <div class="WelcomePage_Container"><div class="loggedout_menubar_container"><div class="clearfix loggedout_menubar"><a class="lfloat" href="/" title="Go to Home"><img class="fb_logo img" src="http://static.ak.fbcdn.net/rsrc.php/yp/r/kk8dc2UJYJ4.png" alt="Facebook logo" width="170" height="36" /></a><div class="rfloat"><div class="menu_login_container"><form method="POST" action="https://login.facebook.com/login.php?login_attempt=1" id="login_form" onsubmit="return Event.__inlineSubmit(this,event)"><input type="hidden" name="charset_test" value="€,´,€,´,?,?,?" /><input type="hidden" name="lsd" value="hEICz" autocomplete="off" /><input type="hidden" id="locale" name="locale" value="en_US" autocomplete="off" /><table cellspacing="0"><tr><td class="html7magic"><label for="email">Email</label></td><td class="html7magic"><label for="pass">Password</label></td></tr><tr><td><input type="text" class="inputtext" name="email" id="email" tabindex="1" /></td><td><input type="password" class="inputtext" name="pass" id="pass" tabindex="2" /></td><td><label class="uiButton uiButtonConfirm"><input value="Login" tabindex="4" type="submit" /></label></td></tr><tr><td class="login_form_label_field"><input type="checkbox" class="inputcheckbox" value="1" id="persistent" name="persistent" checked="1" /><input type="hidden" name="default_persistent" value="1" /><label id="label_persistent" for="persistent">Keep me logged in</label></td><td class="login_form_label_field"><a href="http://www.facebook.com/reset.php" rel="nofollow">Forgot your password?</a></td></tr></table><input type="hidden" name="charset_test" value="€,´,€,´,?,?,?" /><input type="hidden" id="lsd" name="lsd" value="hEICz" autocomplete="off" /></form> </div></div></div></div><div class="WelcomePage_MainSell"><div class="WelcomePage_MainSellCenter clearfix"><div class="WelcomePage_MainSellLeft"><div class="WelcomePage_MainMessage">Facebook helps you connect and share with the people in your life.</div><div class="WelcomePage_MainMap"> </div></div><div class="WelcomePage_MainSellRight"><div class="WelcomePage_SignUpSection"><div class="WelcomePage_SignUpMessage"><div class="WelcomePage_SignUpHeadline">Sign Up</div><div class="WelcomePage_SignUpSubheadline">It's free, and always will be.</div></div><div class="WelcomePage_SimpleReg" id="registration_container"><div><noscript><div id="no_js_box"><h2>Javascript is disabled on your browser.</h2><p>Please enable JavaScript on your browser or upgrade to a Javascript-capable browser to register for Facebook.</p></div></noscript><div id="simple_registration_container" class="simple_registration_container"><div id="reg_box"><form method="post" id="reg" name="reg" onsubmit="return function(event){return false;}.call(this,event)!==false && Event.__inlineSubmit(this,event)"><input type="hidden" autocomplete="off" name="post_form_id" value="c3a2bed7243b9a85f53d69f23a38da9d" /><input type="hidden" name="lsd" value="hEICz" autocomplete="off" /><input type="hidden" autocomplete="off" id="reg_instance" name="reg_instance" value="bpgeTawHFSiHa5-7SR4gbif7" /><input type="hidden" autocomplete="off" id="locale" name="locale" value="en_US" /><input type="hidden" autocomplete="off" id="terms" name="terms" value="on" /><input type="hidden" autocomplete="off" id="abtest_registration_group" name="abtest_registration_group" value="1" /><input type="hidden" autocomplete="off" id="referrer" name="referrer" value="" /><input type="hidden" autocomplete="off" id="md5pass" name="md5pass" value="" /><input type="hidden" autocomplete="off" id="validate_mx_records" name="validate_mx_records" value="1" /><input type="hidden" autocomplete="off" id="ab_test_data" name="ab_test_data" value="" /><div id="reg_form_box" class="large_form"><table class="uiGrid editor" cellspacing="0" cellpadding="1"><tbody><tr><td class="label">First Name:</td><td><div class="field_container"><input type="text" class="inputtext" id="firstname" name="firstname" /></div></td></tr><tr><td class="label">Last Name:</td><td><div class="field_container"><input type="text" class="inputtext" id="lastname" name="lastname" /></div></td></tr><tr><td class="label">Your Email:</td><td><div class="field_container"><input type="text" class="inputtext" id="reg_email__" name="reg_email__" /></div></td></tr><tr><td class="label">Re-enter Email:</td><td><div class="field_container"><input type="text" class="inputtext" id="reg_email_confirmation__" name="reg_email_confirmation__" /></div></td></tr><tr><td class="label">New Password:</td><td><div class="field_container"><input type="password" class="inputtext" id="reg_passwd__" name="reg_passwd__" value="" /></div></td></tr><tr><td class="label">I am:</td><td><div class="field_container"><div class="hidden_elem"><select><option></option><option></option></select><select><option></option><option></option></select></div><select class="select" name="sex" id="sex"><option value="0">Select Sex:</option><option value="1">Female</option><option value="2">Male</option></select></div></td></tr><tr><td class="label">Birthday:</td><td><div class="field_container"> <select class="" id="birthday_month" name="birthday_month" onchange="return run_with(this, ["editor"], function() {editor_date_month_change(this, "birthday_day", "birthday_year");});"><option value="-1">Month:</option><option value="1">Jan</option>

Read the article
JSF and Jquery - doesn't work

- by darkrain

Hi I am trying to get Jquery work in JSF. But i doesn't work. Can somebody help me ? The scripts are in the folder : resources This is my JSP code : I am using netbeans and the <?xml version="1.0" encoding="UTF-8"?>  <jsp:root version="2.1" xmlns:f="http://java.sun.com/jsf/core" xmlns:h="http://java.sun.com/jsf/html" xmlns:jsp="http://java.sun.com/JSP/Page" xmlns:webuijsf="http://www.sun.com/webui/webuijsf"> <jsp:directive.page contentType="text/html;charset=UTF-8" pageEncoding="UTF-8"/> <f:view> <webuijsf:page id="page1"> <webuijsf:html id="html1"> <webuijsf:head id="head1"> <webuijsf:link id="link1" url="/resources/css/stylesheet.css"/> <webuijsf:script id="script1" url="resources/jquery.js"/> <webuijsf:script id="script2" url="recources/main.js" /> <style> body { margin:0; padding:40px; background:#fff; font:80% Arial, Helvetica, sans-serif; color:#555; line-height:180%; } h1{ font-size:180%; font-weight:normal; color:#555; } h2{ clear:both; font-size:160%; font-weight:normal; color:#555; margin:0; padding:.5em 0; } a{ text-decoration:none; color:#f30; } p{ clear:both; margin:0; padding:.5em 0; } pre{ display:block; font:100% "Courier New", Courier, monospace; padding:10px; border:1px solid #bae2f0; background:#e3f4f9; margin:.5em 0; overflow:auto; width:800px; } img{border:none;} ul,li{ margin:0; padding:0; } li{ list-style:none; float:left; display:inline; margin-right:10px; } /* */ #preview{ position:absolute; border:1px solid #ccc; background:#333; padding:5px; display:none; color:#fff; } /* */ </style> </webuijsf:head> <webuijsf:body id="body1" style="-rave-layout: grid"> <webuijsf:form id="form1"> <ul> <li> <a class="preview" href="resources/images/1.jpg"> <img alt="gallery thumbnail" src="resources/images/1s.jpg"/> </a> </li> <li> <a class="preview" href="resources/images/2.jpg"> <img alt="gallery thumbnail" src="resources/images/2s.jpg"/> </a> </li> <li> <a class="preview" href="resources/images/3.jpg"> <img alt="gallery thumbnail" src="resources/images/3s.jpg"/> </a> </li> <li> <a class="preview" href="resources/images/4.jpg"> <img alt="gallery thumbnail" src="resources/images/4s.jpg"/> </a> </li> </ul> </webuijsf:form> </webuijsf:body> </webuijsf:html> </webuijsf:page> </f:view> </jsp:root> Or has someone a real life example with Javascript ?!

Read the article
Dynamic parameters for XSLT 2.0 group-by

- by Ophileon

I got this input <?xml version="1.0" encoding="UTF-8"?> <result> <datapoint poiid="2492" period="2004" value="1240"/> <datapoint poiid="2492" period="2005" value="1290"/> <datapoint poiid="2492" period="2006" value="1280"/> <datapoint poiid="2492" period="2007" value="1320"/> <datapoint poiid="2492" period="2008" value="1330"/> <datapoint poiid="2492" period="2009" value="1340"/> <datapoint poiid="2492" period="2010" value="1340"/> <datapoint poiid="2492" period="2011" value="1335"/> <datapoint poiid="2493" period="2004" value="1120"/> <datapoint poiid="2493" period="2005" value="1120"/> <datapoint poiid="2493" period="2006" value="1100"/> <datapoint poiid="2493" period="2007" value="1100"/> <datapoint poiid="2493" period="2008" value="1100"/> <datapoint poiid="2493" period="2009" value="1110"/> <datapoint poiid="2493" period="2010" value="1105"/> <datapoint poiid="2493" period="2011" value="1105"/> </result> and I use this xslt 2.0 <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0"> <xsl:output method="xml" indent="yes"/> <xsl:template match="result"> <xsl:for-each-group select="datapoint" group-by="@poiid"> <node type="poiid" id="{@poiid}"> <xsl:for-each select="current-group()"> <node type="period" id="{@period}" value="{@value}"/> </xsl:for-each> </node> </xsl:for-each-group> </xsl:template> </xsl:stylesheet> to convert it into <?xml version="1.0" encoding="UTF-8"?> <node type="poiid" id="2492"> <node type="period" id="2004" value="1240"/> <node type="period" id="2005" value="1290"/> <node type="period" id="2006" value="1280"/> <node type="period" id="2007" value="1320"/> <node type="period" id="2008" value="1330"/> <node type="period" id="2009" value="1340"/> <node type="period" id="2010" value="1340"/> <node type="period" id="2011" value="1335"/> </node> <node type="poiid" id="2493"> <node type="period" id="2004" value="1120"/> <node type="period" id="2005" value="1120"/> <node type="period" id="2006" value="1100"/> <node type="period" id="2007" value="1100"/> <node type="period" id="2008" value="1100"/> <node type="period" id="2009" value="1110"/> <node type="period" id="2010" value="1105"/> <node type="period" id="2011" value="1105"/> </node> Works smoothly. Where I got stuck is when I tried to make it more dynamic. The real life input has 6 attributes for each datapoint instead of 3, and the usecase requires the possibility to set the grouping parameters dynamically. I tried using parameters <xsl:param name="k1" select="'poiid'"/> <xsl:param name="k2" select="'period'"/> but passing them to the rest of the xslt is something that I can't get right. The code below doesn't work, but clarifies hopefully, what I'm looking for. <xsl:template match="result"> <xsl:for-each-group select="datapoint" group-by="@{$k1}"> <node type="{$k1}" id="@{$k1}"> <xsl:for-each select="current-group()"> <node type="{$k2}" id="@{$k2}" value="{@value}"/> </xsl:for-each> </node> </xsl:for-each-group> </xsl:template> Any help appreciated..

Read the article
Java: Cannot find a method's symbol even though that method is declared later in the class. The remaining code is looking for a class.

- by Midimistro

This is an assignment that we use strings in Java to analyze a phone number. The error I am having is anything below tester=invalidCharacters(c); does not compile because every line past tester=invalidCharacters(c); is looking for a symbol or the class. In get invalidResults, all I am trying to do is evaluate a given string for non-alphabetical characters such as *,(,^,&,%,@,#,), and so on. What to answer: Why is it producing an error, what will work, and is there an easier method WITHOUT using regex. Here is the link to the assignment: http://cis.csuohio.edu/~hwang/teaching/cis260/assignments/assignment9.html public class PhoneNumber { private int areacode; private int number; private int ext; /////Constructors///// //Third Constructor (given one string arg) "xxx-xxxxxxx" where first three are numbers and the remaining (7) are numbers or letters public PhoneNumber(String newNumber){ //Note: Set default ext to 0 ext=0; ////Declare Temporary Storage and other variables//// //for the first three numbers String areaCodeString; //for the remaining seven characters String newNumberString; //For use in testing the second half of the string boolean containsLetters; boolean containsInvalid; /////Separate the two parts of string///// //Get the area code part of the string areaCodeString=newNumber.substring(0,2); //Convert the string and set it to the area code areacode=Integer.parseInt(areaCodeString); //Skip the "-" and Get the remaining part of the string newNumberString=newNumber.substring(4); //Create an array of characters from newNumberString to reuse in later methods for int length=newNumberString.length(); char [] myCharacters= new char [length]; int i; for (i=0;i<length;i++){ myCharacters [i]=newNumberString.charAt(i); } //Test if newNumberString contains letters & converting them into numbers String reNewNumber=""; //Test for invalid characters containsInvalid=getInvalidResults(newNumberString,length); if (containsInvalid==false){ containsLetters=getCharResults(newNumberString,length); if (containsLetters==true){ for (i=0;i<length;i++){ myCharacters [i]=(char)convertLetNum((myCharacters [i])); reNewNumber=reNewNumber+myCharacters[i]; } } } if (containsInvalid==false){ number=Integer.parseInt(reNewNumber); } else{ System.out.println("Error!"+"\t"+newNumber+" contains illegal characters. This number will be ignored and skipped."); } } //////Primary Methods/Behaviors/////// //Compare this phone number with the one passed by the caller public boolean equals(PhoneNumber pn){ boolean equal; String concat=(areacode+"-"+number); String pN=pn.toString(); if (concat==pN){ equal=true; } else{ equal=false; } return equal; } //Convert the stored number to a certain string depending on extension public String toString(){ String returned; if(ext==0){ returned=(areacode+"-"+number); } else{ returned=(areacode+"-"+number+" ext "+ext); } return returned; } //////Secondary Methods/////// //Method for testing if the second part of the string contains any letters public static boolean getCharResults(String newNumString,int getLength){ //Recreate a character array int i; char [] myCharacters= new char [getLength]; for (i=0;i<getLength;i++){ myCharacters [i]=newNumString.charAt(i); } boolean doesContainLetter=false; int j; for (j=0;j<getLength;j++){ if ((Character.isDigit(myCharacters[j])==true)){ doesContainLetter=false; } else{ doesContainLetter=true; return doesContainLetter; } } return doesContainLetter; } //Method for testing if the second part of the string contains any letters public static boolean getInvalidResults(String newNumString,int getLength){ boolean doesContainInvalid=false; int i; char c; boolean tester; char [] invalidCharacters= new char [getLength]; for (i=0;i<getLength;i++){ invalidCharacters [i]=newNumString.charAt(i); c=invalidCharacters [i]; tester=invalidCharacters(c); if(tester==true)){ doesContainInvalid=false; } else{ doesContainInvalid=true; return doesContainInvalid; } } return doesContainInvalid; } //Method for evaluating string for invalid characters public boolean invalidCharacters(char letter){ boolean returnNum=false; switch (letter){ case 'A': return returnNum; case 'B': return returnNum; case 'C': return returnNum; case 'D': return returnNum; case 'E': return returnNum; case 'F': return returnNum; case 'G': return returnNum; case 'H': return returnNum; case 'I': return returnNum; case 'J': return returnNum; case 'K': return returnNum; case 'L': return returnNum; case 'M': return returnNum; case 'N': return returnNum; case 'O': return returnNum; case 'P': return returnNum; case 'Q': return returnNum; case 'R': return returnNum; case 'S': return returnNum; case 'T': return returnNum; case 'U': return returnNum; case 'V': return returnNum; case 'W': return returnNum; case 'X': return returnNum; case 'Y': return returnNum; case 'Z': return returnNum; default: return true; } } //Method for converting letters to numbers public int convertLetNum(char letter){ int returnNum; switch (letter){ case 'A': returnNum=2;return returnNum; case 'B': returnNum=2;return returnNum; case 'C': returnNum=2;return returnNum; case 'D': returnNum=3;return returnNum; case 'E': returnNum=3;return returnNum; case 'F': returnNum=3;return returnNum; case 'G': returnNum=4;return returnNum; case 'H': returnNum=4;return returnNum; case 'I': returnNum=4;return returnNum; case 'J': returnNum=5;return returnNum; case 'K': returnNum=5;return returnNum; case 'L': returnNum=5;return returnNum; case 'M': returnNum=6;return returnNum; case 'N': returnNum=6;return returnNum; case 'O': returnNum=6;return returnNum; case 'P': returnNum=7;return returnNum; case 'Q': returnNum=7;return returnNum; case 'R': returnNum=7;return returnNum; case 'S': returnNum=7;return returnNum; case 'T': returnNum=8;return returnNum; case 'U': returnNum=8;return returnNum; case 'V': returnNum=8;return returnNum; case 'W': returnNum=9;return returnNum; case 'X': returnNum=9;return returnNum; case 'Y': returnNum=9;return returnNum; case 'Z': returnNum=9;return returnNum; default: return 0; } } } Note: Please Do not use this program to cheat in your own class. To ensure of this, I will take this question down if it has not been answered by the end of 2013, if I no longer need an explanation for it, or if the term for the class has ended.

Read the article
JAVA-SQL- Data Migration - ResultSets comparing Failing JUnit test

- by user1865053

I CANNOT get this JUnit Test to pass for the life of me. Can somebody point out where this has gone wrong. I am doing a data migration(MSSQL SERVER 2005), but I have the sourceDBUrl and the targetDCUrl the same URL so to narrow it down to syntax errors. So that is what I have, a syntax error. I am comparing the results of a table for the query SELECT programmeapproval, resourceapproval FROM tr_timesheet WHERE timesheetid = ? and the test always fails, but passes for other junit tests I have developed. I created 3 diffemt resultSetsEqual methods and none work. Yet, some other JUnit tests I have developed have PASSED. THE QUERY: SELECT timesheetid, programmeapproval, resourceapproval FROM tr_timesheet Returns three columns timesheetid (PK,int, not null) (populated with a range of numbers 2240 - 2282) programmeapproval (smallint,not null) (populated with the number 1 in every field) resourceapproval (smallint, not null) (populated with a number 1 in every field) When I run the query that is embedded in the code it only returns one row with the programmeapproval and resourceapproval columns and both field populated with the number 1. I have all jdbc drivers correctly installed and tested for connectivity. The JUnit Test is failing at this point according to the IDE. assertTrue(helper.resultSetsEqual2(sourceVal,targetVal)); This is the code: /*THIS IS A JUNIT CLASS****? package a7.unittests.dao; import static org.junit.Assert.assertTrue; import java.sql.Connection; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.Types; import org.junit.Test; import artemispm.tritonalerts.TimesheetAlert; public class UnitTestTimesheetAlert { @Test public void testQUERY_CHECKALERT() throws Exception{ UnitTestHelper helper = new UnitTestHelper(); Connection con = helper.getConnection(helper.sourceDBUrl); Connection conTarget = helper.getConnection(helper.targetDBUrl); PreparedStatement stmt = con.prepareStatement("select programmeapproval, resourceapproval from tr_timesheet where timesheetid = ?"); stmt.setInt(1, 2240); ResultSet sourceVal = stmt.executeQuery(); stmt = conTarget.prepareStatement("select programmeapproval, resourceapproval from tr_timesheet where timesheetid = ?"); stmt.setInt(1,2240); ResultSet targetVal = stmt.executeQuery(); assertTrue(helper.resultSetsEqual2(sourceVal,targetVal)); }} /*END**/ /*THIS IS A REGULAR CLASS**/ package a7.unittests.dao; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.ResultSetMetaData; import java.sql.SQLException; public class UnitTestHelper { static String sourceDBUrl = "jdbc:sqlserver://127.0.0.1:1433;databaseName=a7itm;user=a7user;password=a7user"; static String targetDBUrl = "jdbc:sqlserver://127.0.0.1:1433;databaseName=a7itm;user=a7user;password=a7user"; public Connection getConnection(String url)throws Exception{ return DriverManager.getConnection(url); } public boolean resultSetsEqual3 (ResultSet rs1, ResultSet rs2) throws SQLException { int col = 1; //ResultSetMetaData metadata = rs1.getMetaData(); //int count = metadata.getColumnCount(); while (rs1.next() && rs2.next()) { final Object res1 = rs1.getObject(col); final Object res2 = rs2.getObject(col); // Check values if (!res1.equals(res2)) { throw new RuntimeException(String.format("%s and %s aren't equal at common position %d", res1, res2, col)); } // rs1 and rs2 must reach last row in the same iteration if ((rs1.isLast() != rs2.isLast())) { throw new RuntimeException("The two ResultSets contains different number of columns!"); } } return true; } public boolean resultSetsEqual (ResultSet source, ResultSet target) throws SQLException{ while(source.next()) { target.next(); ResultSetMetaData metadata = source.getMetaData(); int count = metadata.getColumnCount(); for (int i =1; i<=count; i++) { if(source.getObject(i) != target.getObject(i)) { return false; } } } return true; } public boolean resultSetsEqual2 (ResultSet source, ResultSet target) throws SQLException{ while(source.next()) { target.next(); ResultSetMetaData metadata = source.getMetaData(); int count = metadata.getColumnCount(); for (int i =1; i<=count; i++) { if(source.getObject(i).equals(target.getObject(i))) { return false; } } } return true; } } /END***/ /*PASTED NEW CLASS - THIS IS A JUNIT TEST CLASS*/ package a7.unittests.dao; import static org.junit.Assert.*; import java.sql.Connection; import java.sql.DriverManager; import org.junit.Test; public class TestDatabaseConnection { @Test public void testConnection() throws Exception{ UnitTestHelper helper = new UnitTestHelper(); Connection con = helper.getConnection(helper.sourceDBUrl); Connection conTarget = helper.getConnection(helper.targetDBUrl); assertTrue(con != null && conTarget != null); } } /**END***/

Read the article
How to optimize method's that track metrics in 3rd party application?

- by WulfgarPro

Hi, I have two listboxes that keep updated lists of various objects roaming in a 3rd party application. When a user selects an object from a listbox, an event handler is fired, calling a method that gathers various metrics belonging to that object from the 3rd party application for displaying in a set of textboxes. This is slow! I am not sure how to optimize this functionality to facilitate greater speeds.. private void lsbUavs_SelectedIndexChanged(object sender, EventArgs e) { if (_ourSelectedUavFromListBox != null) { UtilStkScenario.ChangeUavColourOnScenario(_ourSelectedUavFromListBox.UavName, false); } if (lsbUavs.SelectedItem != null) { Uav ourUav = UtilStkScenario.FindUavFromScenarioBasedOnName(lsbUavs.SelectedItem.ToString()); hsbThrottle.Value = (int)ourUav.ThrottleValue; UtilStkScenario.ChangeUavColourOnScenario(ourUav.UavName, true); _ourSelectedUavFromListBox = ourUav; // we don't want this thread spawning many times if (tUpdateMetricInformationInTabControl != null) { if (tUpdateMetricInformationInTabControl.IsAlive) { tUpdateMetricInformationInTabControl.Abort(); } } tUpdateMetricInformationInTabControl = new Thread(UpdateMetricInformationInTabControl); tUpdateMetricInformationInTabControl.Name = "UpdateMetricInformationInTabControlUavs"; tUpdateMetricInformationInTabControl.IsBackground = true; tUpdateMetricInformationInTabControl.Start(lsbUavs); } } delegate string GetNameOfListItem(ListBox listboxId); delegate void SetTextBoxValue(TextBox textBoxId, string valueToSet); private void UpdateMetricInformationInTabControl(object listBoxToUpdate) { ListBox theListBoxToUpdate = (ListBox)listBoxToUpdate; GetNameOfListItem dGetNameOfListItem = new GetNameOfListItem(GetNameOfSelectedListItem); SetTextBoxValue dSetTextBoxValue = new SetTextBoxValue(SetNamedTextBoxValue); try { foreach (KeyValuePair<string, IAgStkObject> entity in UtilStkScenario._totalListOfAllStkObjects) { if (entity.Key.ToString() == (string)theListBoxToUpdate.Invoke(dGetNameOfListItem, theListBoxToUpdate)) { while ((string)theListBoxToUpdate.Invoke(dGetNameOfListItem, theListBoxToUpdate) == entity.Key.ToString()) { if (theListBoxToUpdate.Name == "lsbEntities") { double[] latLonAndAltOfEntity = UtilStkScenario.FindMetricsOfStkObjectOnScenario(UtilStkScenario._stkObjectRoot.CurrentTime, entity.Value); SetEntityOrUavMetricValuesInTextBoxes(dSetTextBoxValue, "Entity", entity.Key, "", "", "", "", latLonAndAltOfEntity[4].ToString(), latLonAndAltOfEntity[3].ToString()); } else if (theListBoxToUpdate.Name == "lsbUavs") { double[] latLonAndAltOfEntity = UtilStkScenario.FindMetricsOfStkObjectOnScenario(UtilStkScenario._stkObjectRoot.CurrentTime, entity.Value); SetEntityOrUavMetricValuesInTextBoxes(dSetTextBoxValue, "UAV", entity.Key, entity.Value.ClassName.ToString(), latLonAndAltOfEntity[0].ToString(), latLonAndAltOfEntity[1].ToString(), latLonAndAltOfEntity[2].ToString(), latLonAndAltOfEntity[4].ToString(), latLonAndAltOfEntity[3].ToString()); } } } } } catch (Exception e) { // selected entity was deleted(end-of-life) in STK - remove LLA information from GUI if (theListBoxToUpdate.Name == "lsbEntities") { SetEntityOrUavMetricValuesInTextBoxes(dSetTextBoxValue, "Entity", "", "", "", "", "", "", ""); UtilLog.Log(e.Message.ToString(), e.GetType().ToString(), "UpdateMetricInformationInTabControl", UtilLog.logWriter); } else if (theListBoxToUpdate.Name == "lsbUavs") { SetEntityOrUavMetricValuesInTextBoxes(dSetTextBoxValue, "UAV", "", "", "", "", "", "", ""); UtilLog.Log(e.Message.ToString(), e.GetType().ToString(), "UpdateMetricInformationInTabControl", UtilLog.logWriter); } } } internal static double[] FindMetricsOfStkObjectOnScenario(object timeToFindMetricState, IAgStkObject stkObject) { double[] stkObjectMetrics = null; try { stkObjectMetrics = new double[5]; object latOfStkObject, lonOfStkObject; double altOfStkObject, headingOfStkObject, velocityOfStkObject; IAgProvideSpatialInfo spatial = stkObject as IAgProvideSpatialInfo; IAgVeSpatialInfo spatialInfo = spatial.GetSpatialInfo(false); IAgSpatialState spatialState = spatialInfo.GetState(timeToFindMetricState); spatialState.FixedPosition.QueryPlanetodetic(out latOfStkObject, out lonOfStkObject, out altOfStkObject); double[] stkObjectheadingAndVelocity = FindHeadingAndVelocityOfStkObjectFromScenario(stkObject.InstanceName); headingOfStkObject = stkObjectheadingAndVelocity[0]; velocityOfStkObject = stkObjectheadingAndVelocity[1]; stkObjectMetrics[0] = (double)latOfStkObject; stkObjectMetrics[1] = (double)lonOfStkObject; stkObjectMetrics[2] = altOfStkObject; stkObjectMetrics[3] = headingOfStkObject; stkObjectMetrics[4] = velocityOfStkObject; } catch (Exception e) { UtilLog.Log(e.Message.ToString(), e.GetType().ToString(), "FindMetricsOfStkObjectOnScenario", UtilLog.logWriter); } return stkObjectMetrics; } private static double[] FindHeadingAndVelocityOfStkObjectFromScenario(string stkObjectName) { double[] stkObjectHeadingAndVelocity = new double[2]; IAgStkObject stkUavObject = null; try { string typeOfObject = CheckIfStkObjectIsEntityOrUav(stkObjectName); if (typeOfObject == "UAV") { stkUavObject = _stkObjectRootToIsolateForUavs.CurrentScenario.Children[stkObjectName]; IAgDataProviderGroup group = (IAgDataProviderGroup)stkUavObject.DataProviders["Heading"]; IAgDataProvider provider = (IAgDataProvider)group.Group["Fixed"]; IAgDrResult result = ((IAgDataPrvTimeVar)provider).ExecSingle(_stkObjectRootToIsolateForUavs.CurrentTime); stkObjectHeadingAndVelocity[0] = (double)result.DataSets[1].GetValues().GetValue(0); stkObjectHeadingAndVelocity[1] = (double)result.DataSets[4].GetValues().GetValue(0); } else if (typeOfObject == "Entity") { IAgStkObject stkEntityObject = _stkObjectRootToIsolateForEntities.CurrentScenario.Children[stkObjectName]; IAgDataProviderGroup group = (IAgDataProviderGroup)stkEntityObject.DataProviders["Heading"]; IAgDataProvider provider = (IAgDataProvider)group.Group["Fixed"]; IAgDrResult result = ((IAgDataPrvTimeVar)provider).ExecSingle(_stkObjectRootToIsolateForEntities.CurrentTime); stkObjectHeadingAndVelocity[0] = (double)result.DataSets[1].GetValues().GetValue(0); stkObjectHeadingAndVelocity[1] = (double)result.DataSets[4].GetValues().GetValue(0); } } catch (Exception e) { UtilLog.Log(e.Message.ToString(), e.GetType().ToString(), "FindHeadingAndVelocityOfStkObjectFromScenario", UtilLog.logWriter); } return stkObjectHeadingAndVelocity; } Any help would be really appreciated. From my knowledge, I cant really see any issues with the C#. Maybe it has to do with the methodology I'm using.. maybe some kind of caching mechanism is required - this is not natively available. WulfgarPro

Read the article
mysqld crashes on any statement

- by ??iu

I restarted my slave to change configuration settings to skip reverse hostname lookup on connecting and to enable the slow query log. I edited /etc/my.cnf making only these changes, then restarted mysqld with /etc/init.d/mysql restart All appeared to be well but when I connect to msyqld remotely or locally though it connects okay a slight problem is that mysqld crashes whenever you try to issue any kind of statement. The client looks like: Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.1.31-1ubuntu2-log Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> show tables; ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 1 Current database: mydb ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... ERROR 2003 (HY000): Can't connect to MySQL server on 'xx.xx.xx.xx' (61) ERROR: Can't connect to the server ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... ERROR 2003 (HY000): Can't connect to MySQL server on 'xx.xx.xx.xx' (61) ERROR: Can't connect to the server ERROR 2006 (HY000): MySQL server has gone away Bus error The mysqld error log looks like: 101210 16:35:51 InnoDB: Error: (1500) Couldn't read the MAX(job_id) autoinc value from the index (PRIMARY). 101210 16:35:51 InnoDB: Assertion failure in thread 140245598570832 in file handler/ha_innodb.cc line 2595 InnoDB: Failing assertion: error == DB_SUCCESS InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html InnoDB: about forcing recovery. 101210 16:35:51 - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=16777216 read_buffer_size=131072 max_used_connections=3 max_threads=600 threads_connected=3 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1328077 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x18209220 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f8d791580d0 thread_stack 0x20000 /usr/sbin/mysqld(my_print_stacktrace+0x29) [0x8b4f89] /usr/sbin/mysqld(handle_segfault+0x383) [0x5f8f03] /lib/libpthread.so.0 [0x7f902a76a080] /lib/libc.so.6(gsignal+0x35) [0x7f90291f8fb5] /lib/libc.so.6(abort+0x183) [0x7f90291fabc3] /usr/sbin/mysqld(ha_innobase::open(char const*, int, unsigned int)+0x41b) [0x781f4b] /usr/sbin/mysqld(handler::ha_open(st_table*, char const*, int, int)+0x3f) [0x6db00f] /usr/sbin/mysqld(open_table_from_share(THD*, st_table_share*, char const*, unsigned int, unsigned int, unsigned int, st_table*, bool)+0x57a) [0x64760a] /usr/sbin/mysqld [0x63f281] /usr/sbin/mysqld(open_table(THD*, TABLE_LIST*, st_mem_root*, bool*, unsigned int)+0x626) [0x641e16] /usr/sbin/mysqld(open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int)+0x5db) [0x6429cb] /usr/sbin/mysqld(open_normal_and_derived_tables(THD*, TABLE_LIST*, unsigned int)+0x1e) [0x642b0e] /usr/sbin/mysqld(mysqld_list_fields(THD*, TABLE_LIST*, char const*)+0x22) [0x70b292] /usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x146d) [0x60dc1d] /usr/sbin/mysqld(do_command(THD*)+0xe8) [0x60dda8] /usr/sbin/mysqld(handle_one_connection+0x226) [0x601426] /lib/libpthread.so.0 [0x7f902a7623ba] /lib/libc.so.6(clone+0x6d) [0x7f90292abfcd] Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x18213c70 = thd->thread_id=3 thd->killed=NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 101210 16:35:51 mysqld_safe Number of processes running now: 0 101210 16:35:51 mysqld_safe mysqld restarted InnoDB: The log sequence number in ibdata files does not match InnoDB: the log sequence number in the ib_logfiles! 101210 16:35:54 InnoDB: Database was not shut down normally! InnoDB: Starting crash recovery. InnoDB: Reading tablespace information from the .ibd files... InnoDB: Restoring possible half-written data pages from the doublewrite InnoDB: buffer... 101210 16:35:56 InnoDB: Started; log sequence number 456 143528628 101210 16:35:56 [Warning] 'user' entry 'root@PSDB102' ignored in --skip-name-resolve mode. 101210 16:35:56 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysqld-relay-bin' to avoid this problem. 101210 16:35:56 [Note] Event Scheduler: Loaded 0 events 101210 16:35:56 [Note] /usr/sbin/mysqld: ready for connections. Version: '5.1.31-1ubuntu2-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 (Ubuntu) 101210 16:36:11 InnoDB: Error: (1500) Couldn't read the MAX(job_id) autoinc value from the index (PRIMARY). 101210 16:36:11 InnoDB: Assertion failure in thread 139955151501648 in file handler/ha_innodb.cc line 2595 InnoDB: Failing assertion: error == DB_SUCCESS InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html InnoDB: about forcing recovery. 101210 16:36:11 - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=16777216 read_buffer_size=131072 max_used_connections=1 max_threads=600 threads_connected=1 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1328077 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x18588720 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f49d916f0d0 thread_stack 0x20000 /usr/sbin/mysqld(my_print_stacktrace+0x29) [0x8b4f89] /usr/sbin/mysqld(handle_segfault+0x383) [0x5f8f03] /lib/libpthread.so.0 [0x7f4c8a73f080] /lib/libc.so.6(gsignal+0x35) [0x7f4c891cdfb5] /lib/libc.so.6(abort+0x183) [0x7f4c891cfbc3] /usr/sbin/mysqld(ha_innobase::open(char const*, int, unsigned int)+0x41b) [0x781f4b] /usr/sbin/mysqld(handler::ha_open(st_table*, char const*, int, int)+0x3f) [0x6db00f] /usr/sbin/mysqld(open_table_from_share(THD*, st_table_share*, char const*, unsigned int, unsigned int, unsigned int, st_table*, bool)+0x57a) [0x64760a] /usr/sbin/mysqld [0x63f281] /usr/sbin/mysqld(open_table(THD*, TABLE_LIST*, st_mem_root*, bool*, unsigned int)+0x626) [0x641e16] /usr/sbin/mysqld(open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int)+0x5db) [0x6429cb] /usr/sbin/mysqld(open_normal_and_derived_tables(THD*, TABLE_LIST*, unsigned int)+0x1e) [0x642b0e] /usr/sbin/mysqld(mysqld_list_fields(THD*, TABLE_LIST*, char const*)+0x22) [0x70b292] /usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x146d) [0x60dc1d] /usr/sbin/mysqld(do_command(THD*)+0xe8) [0x60dda8] /usr/sbin/mysqld(handle_one_connection+0x226) [0x601426] /lib/libpthread.so.0 [0x7f4c8a7373ba] /lib/libc.so.6(clone+0x6d) [0x7f4c89280fcd] Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x18599950 = thd->thread_id=1 thd->killed=NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 101210 16:36:11 mysqld_safe Number of processes running now: 0 101210 16:36:11 mysqld_safe mysqld restarted The config is [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] innodb_file_per_table innodb_buffer_pool_size=10G innodb_log_buffer_size=4M innodb_flush_log_at_trx_commit=2 innodb_thread_concurrency=8 skip-slave-start server-id=3 # # * IMPORTANT # If you make changes to these settings and your system uses apparmor, you may # also need to also adjust /etc/apparmor.d/usr.sbin.mysqld. # user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /DB2/mysql tmpdir = /tmp skip-external-locking # # Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. #bind-address = 127.0.0.1 # # * Fine Tuning # key_buffer = 16M max_allowed_packet = 16M thread_stack = 128K thread_cache_size = 8 # This replaces the startup script and checks MyISAM tables if needed # the first time they are touched myisam-recover = BACKUP max_connections = 600 #table_cache = 64 #thread_concurrency = 10 # # * Query Cache Configuration # query_cache_limit = 1M query_cache_size = 32M # skip-federated slow-query-log skip-name-resolve Update: I followed the instructions as per http://dev.mysql.com/doc/refman/5.1/en/forcing-innodb-recovery.html and set innodb_force_recovery = 4 and the logs are showing a different error but the behavior is still the same: 101210 19:14:15 mysqld_safe mysqld restarted 101210 19:14:19 InnoDB: Started; log sequence number 456 143528628 InnoDB: !!! innodb_force_recovery is set to 4 !!! 101210 19:14:19 [Warning] 'user' entry 'root@PSDB102' ignored in --skip-name-resolve mode. 101210 19:14:19 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysqld-relay-bin' to avoid this problem. 101210 19:14:19 [Note] Event Scheduler: Loaded 0 events 101210 19:14:19 [Note] /usr/sbin/mysqld: ready for connections. Version: '5.1.31-1ubuntu2-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 (Ubuntu) 101210 19:14:32 InnoDB: error: space object of table mydb/__twitter_friend, InnoDB: space id 1602 did not exist in memory. Retrying an open. 101210 19:14:32 InnoDB: error: space object of table mydb/access_request, InnoDB: space id 1318 did not exist in memory. Retrying an open. 101210 19:14:32 InnoDB: error: space object of table mydb/activity, InnoDB: space id 1595 did not exist in memory. Retrying an open. 101210 19:14:32 - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=16777216 read_buffer_size=131072 max_used_connections=1 max_threads=600 threads_connected=1 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1328077 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x1753c070 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f7a0b5800d0 thread_stack 0x20000 /usr/sbin/mysqld(my_print_stacktrace+0x29) [0x8b4f89] /usr/sbin/mysqld(handle_segfault+0x383) [0x5f8f03] /lib/libpthread.so.0 [0x7f7cbc350080] /usr/sbin/mysqld(ha_innobase::innobase_get_index(unsigned int)+0x46) [0x77c516] /usr/sbin/mysqld(ha_innobase::innobase_initialize_autoinc()+0x40) [0x77c640] /usr/sbin/mysqld(ha_innobase::open(char const*, int, unsigned int)+0x3f3) [0x781f23] /usr/sbin/mysqld(handler::ha_open(st_table*, char const*, int, int)+0x3f) [0x6db00f] /usr/sbin/mysqld(open_table_from_share(THD*, st_table_share*, char const*, unsigned int, unsigned int, unsigned int, st_table*, bool)+0x57a) [0x64760a] /usr/sbin/mysqld [0x63f281] /usr/sbin/mysqld(open_table(THD*, TABLE_LIST*, st_mem_root*, bool*, unsigned int)+0x626) [0x641e16] /usr/sbin/mysqld(open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int)+0x5db) [0x6429cb] /usr/sbin/mysqld(open_normal_and_derived_tables(THD*, TABLE_LIST*, unsigned int)+0x1e) [0x642b0e] /usr/sbin/mysqld(mysqld_list_fields(THD*, TABLE_LIST*, char const*)+0x22) [0x70b292] /usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x146d) [0x60dc1d] /usr/sbin/mysqld(do_command(THD*)+0xe8) [0x60dda8] /usr/sbin/mysqld(handle_one_connection+0x226) [0x601426] /lib/libpthread.so.0 [0x7f7cbc3483ba] /lib/libc.so.6(clone+0x6d) [0x7f7cbae91fcd] Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x1754d690 = thd->thread_id=1 thd->killed=NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash.

Read the article
TCP packets larger than 4 KB don't get a reply from Linux

- by pts

I'm running Linux 3.2.51 in a virtual machine (192.168.33.15). I'm sending Ethernet frames to it. I'm writing custom software trying to emulate a TCP peer, the other peer is Linux running in the virtual machine guest. I've noticed that TCP packets larger than about 4 KB are ignored (i.e. dropped without an ACK) by the Linux guest. If I decrease the packet size by 50 bytes, I get an ACK. I'm not sending new payload data until the Linux guest fully ACKs the previous one. I've increased ifconfig eth0 mtu 51000, and ping -c 1 -s 50000 goes through (from guest to my emulator) and the Linux guest gets a reply of the same size. I've also increased sysctl -w net.ipv4.tcp_rmem='70000 87380 87380 and tried with sysctl -w net.ipv4.tcp_mtu_probing=1 (and also =0). There is no IPv3 packet fragmentation, all packets have the DF flag set. It works the other way round: the Linux guest can send TCP packets of 6900 bytes of payload and my emulator understands them. This is very strange to me, because only TCP packets seem to be affected (large ICMP packets go through). Any idea what can be imposing this limit? Any idea how to do debug it in the Linux kernel? See the tcpdump -n -vv output below. tcpdump was run on the Linux guest. The last line is interesting: 4060 bytes of TCP payload is sent to the guest, and it doesn't get any reply packet from the Linux guest for half a minute. 14:59:32.000057 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 192.168.33.1.36522 > 192.168.33.15.22: Flags [S], cksum 0x8da0 (correct), seq 10000000, win 14600, length 0 14:59:32.000086 IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 44) 192.168.33.15.22 > 192.168.33.1.36522: Flags [S.], cksum 0xc37f (incorrect -> 0x5999), seq 1415680476, ack 10000001, win 19920, options [mss 9960], length 0 14:59:32.000218 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 192.168.33.1.36522 > 192.168.33.15.22: Flags [.], cksum 0xa752 (correct), ack 1, win 14600, length 0 14:59:32.000948 IP (tos 0x10, ttl 64, id 53777, offset 0, flags [DF], proto TCP (6), length 66) 192.168.33.15.22 > 192.168.33.1.36522: Flags [P.], cksum 0xc395 (incorrect -> 0xfa01), seq 1:27, ack 1, win 19920, length 26 14:59:32.001575 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 192.168.33.1.36522 > 192.168.33.15.22: Flags [.], cksum 0xa738 (correct), ack 27, win 14600, length 0 14:59:32.001585 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 65) 192.168.33.1.36522 > 192.168.33.15.22: Flags [P.], cksum 0x48d6 (correct), seq 1:26, ack 27, win 14600, length 25 14:59:32.001589 IP (tos 0x10, ttl 64, id 53778, offset 0, flags [DF], proto TCP (6), length 40) 192.168.33.15.22 > 192.168.33.1.36522: Flags [.], cksum 0xc37b (incorrect -> 0x9257), ack 26, win 19920, length 0 14:59:32.001680 IP (tos 0x10, ttl 64, id 53779, offset 0, flags [DF], proto TCP (6), length 496) 192.168.33.15.22 > 192.168.33.1.36522: Flags [P.], seq 27:483, ack 26, win 19920, length 456 14:59:32.001784 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 192.168.33.1.36522 > 192.168.33.15.22: Flags [.], cksum 0xa557 (correct), ack 483, win 14600, length 0 14:59:32.006367 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 1136) 192.168.33.1.36522 > 192.168.33.15.22: Flags [P.], seq 26:1122, ack 483, win 14600, length 1096 14:59:32.044150 IP (tos 0x10, ttl 64, id 53780, offset 0, flags [DF], proto TCP (6), length 40) 192.168.33.15.22 > 192.168.33.1.36522: Flags [.], cksum 0xc37b (incorrect -> 0x8c47), ack 1122, win 19920, length 0 14:59:32.045310 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 312) 192.168.33.1.36522 > 192.168.33.15.22: Flags [P.], seq 1122:1394, ack 483, win 14600, length 272 14:59:32.045322 IP (tos 0x10, ttl 64, id 53781, offset 0, flags [DF], proto TCP (6), length 40) 192.168.33.15.22 > 192.168.33.1.36522: Flags [.], cksum 0xc37b (incorrect -> 0x8b37), ack 1394, win 19920, length 0 14:59:32.925726 IP (tos 0x10, ttl 64, id 53782, offset 0, flags [DF], proto TCP (6), length 1112) 192.168.33.15.22 > 192.168.33.1.36522: Flags [.], seq 483:1555, ack 1394, win 19920, length 1072 14:59:32.925750 IP (tos 0x10, ttl 64, id 53784, offset 0, flags [DF], proto TCP (6), length 312) 192.168.33.15.22 > 192.168.33.1.36522: Flags [P.], seq 1555:1827, ack 1394, win 19920, length 272 14:59:32.927131 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 192.168.33.1.36522 > 192.168.33.15.22: Flags [.], cksum 0x9bcf (correct), ack 1555, win 14600, length 0 14:59:32.927148 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 192.168.33.1.36522 > 192.168.33.15.22: Flags [.], cksum 0x9abf (correct), ack 1827, win 14600, length 0 14:59:32.932248 IP (tos 0x10, ttl 64, id 53785, offset 0, flags [DF], proto TCP (6), length 56) 192.168.33.15.22 > 192.168.33.1.36522: Flags [P.], cksum 0xc38b (incorrect -> 0xd247), seq 1827:1843, ack 1394, win 19920, length 16 14:59:32.932366 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 192.168.33.1.36522 > 192.168.33.15.22: Flags [.], cksum 0x9aaf (correct), ack 1843, win 14600, length 0 14:59:32.964295 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 104) 192.168.33.1.36522 > 192.168.33.15.22: Flags [P.], seq 1394:1458, ack 1843, win 14600, length 64 14:59:32.964310 IP (tos 0x10, ttl 64, id 53786, offset 0, flags [DF], proto TCP (6), length 40) 192.168.33.15.22 > 192.168.33.1.36522: Flags [.], cksum 0xc37b (incorrect -> 0x85a7), ack 1458, win 19920, length 0 14:59:32.964561 IP (tos 0x10, ttl 64, id 53787, offset 0, flags [DF], proto TCP (6), length 88) 192.168.33.15.22 > 192.168.33.1.36522: Flags [P.], seq 1843:1891, ack 1458, win 19920, length 48 14:59:32.965185 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 192.168.33.1.36522 > 192.168.33.15.22: Flags [.], cksum 0x9a3f (correct), ack 1891, win 14600, length 0 14:59:32.965196 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 104) 192.168.33.1.36522 > 192.168.33.15.22: Flags [P.], seq 1458:1522, ack 1891, win 14600, length 64 14:59:32.965233 IP (tos 0x10, ttl 64, id 53788, offset 0, flags [DF], proto TCP (6), length 88) 192.168.33.15.22 > 192.168.33.1.36522: Flags [P.], seq 1891:1939, ack 1522, win 19920, length 48 14:59:32.965970 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 192.168.33.1.36522 > 192.168.33.15.22: Flags [.], cksum 0x99cf (correct), ack 1939, win 14600, length 0 14:59:32.965979 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 568) 192.168.33.1.36522 > 192.168.33.15.22: Flags [P.], seq 1522:2050, ack 1939, win 14600, length 528 14:59:32.966112 IP (tos 0x10, ttl 64, id 53789, offset 0, flags [DF], proto TCP (6), length 520) 192.168.33.15.22 > 192.168.33.1.36522: Flags [P.], seq 1939:2419, ack 2050, win 19920, length 480 14:59:32.970059 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 192.168.33.1.36522 > 192.168.33.15.22: Flags [.], cksum 0x95df (correct), ack 2419, win 14600, length 0 14:59:32.970089 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 616) 192.168.33.1.36522 > 192.168.33.15.22: Flags [P.], seq 2050:2626, ack 2419, win 14600, length 576 14:59:32.981159 IP (tos 0x10, ttl 64, id 53790, offset 0, flags [DF], proto TCP (6), length 72) 192.168.33.15.22 > 192.168.33.1.36522: Flags [P.], cksum 0xc39b (incorrect -> 0xa84f), seq 2419:2451, ack 2626, win 19920, length 32 14:59:32.982347 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 192.168.33.1.36522 > 192.168.33.15.22: Flags [.], cksum 0x937f (correct), ack 2451, win 14600, length 0 14:59:32.982357 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 104) 192.168.33.1.36522 > 192.168.33.15.22: Flags [P.], seq 2626:2690, ack 2451, win 14600, length 64 14:59:32.982401 IP (tos 0x10, ttl 64, id 53791, offset 0, flags [DF], proto TCP (6), length 88) 192.168.33.15.22 > 192.168.33.1.36522: Flags [P.], seq 2451:2499, ack 2690, win 19920, length 48 14:59:32.982570 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 192.168.33.1.36522 > 192.168.33.15.22: Flags [.], cksum 0x930f (correct), ack 2499, win 14600, length 0 14:59:32.982702 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 104) 192.168.33.1.36522 > 192.168.33.15.22: Flags [P.], seq 2690:2754, ack 2499, win 14600, length 64 14:59:33.020066 IP (tos 0x10, ttl 64, id 53792, offset 0, flags [DF], proto TCP (6), length 40) 192.168.33.15.22 > 192.168.33.1.36522: Flags [.], cksum 0xc37b (incorrect -> 0x7e07), ack 2754, win 19920, length 0 14:59:33.983503 IP (tos 0x10, ttl 64, id 53793, offset 0, flags [DF], proto TCP (6), length 72) 192.168.33.15.22 > 192.168.33.1.36522: Flags [P.], cksum 0xc39b (incorrect -> 0x2aa7), seq 2499:2531, ack 2754, win 19920, length 32 14:59:33.983810 IP (tos 0x10, ttl 64, id 53794, offset 0, flags [DF], proto TCP (6), length 88) 192.168.33.15.22 > 192.168.33.1.36522: Flags [P.], seq 2531:2579, ack 2754, win 19920, length 48 14:59:33.984100 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 192.168.33.1.36522 > 192.168.33.15.22: Flags [.], cksum 0x92af (correct), ack 2531, win 14600, length 0 14:59:33.984139 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 192.168.33.1.36522 > 192.168.33.15.22: Flags [.], cksum 0x927f (correct), ack 2579, win 14600, length 0 14:59:34.022914 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 104) 192.168.33.1.36522 > 192.168.33.15.22: Flags [P.], seq 2754:2818, ack 2579, win 14600, length 64 14:59:34.022939 IP (tos 0x10, ttl 64, id 53795, offset 0, flags [DF], proto TCP (6), length 40) 192.168.33.15.22 > 192.168.33.1.36522: Flags [.], cksum 0xc37b (incorrect -> 0x7d77), ack 2818, win 19920, length 0 14:59:34.023554 IP (tos 0x10, ttl 64, id 53796, offset 0, flags [DF], proto TCP (6), length 88) 192.168.33.15.22 > 192.168.33.1.36522: Flags [P.], seq 2579:2627, ack 2818, win 19920, length 48 14:59:34.027571 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 192.168.33.1.36522 > 192.168.33.15.22: Flags [.], cksum 0x920f (correct), ack 2627, win 14600, length 0 14:59:34.027603 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 4100) 192.168.33.1.36522 > 192.168.33.15.22: Flags [P.], seq 2818:6878, ack 2627, win 14600, length 4060

Read the article
qemu-kvm virtual machine virtio network freeze under load

- by Rick Koshi

I'm having a problem with my virtual machines, where the network will freeze under heavy load. I'm using CentOS 6.2 as both host and guest, not using libvirt, just running qemu-kvm directly as follows: /usr/libexec/qemu-kvm \ -drive file=/data2/vm/rb-dev2-www1-vm.img,index=0,media=disk,cache=none,if=virtio \ -boot order=c \ -m 2G \ -smp cores=1,threads=2 \ -vga std \ -name rb-dev2-www1-vm \ -vnc :84,password \ -net nic,vlan=0,macaddr=52:54:20:00:00:54,model=virtio \ -net tap,vlan=0,ifname=tap84,script=/etc/qemu-ifup \ -monitor unix:/var/run/vm/rb-dev2-www1-vm.mon,server,nowait \ -rtc base=utc \ -device piix3-usb-uhci \ -device usb-tablet /etc/qemu-ifup (used by the above command) is a very simple script, containing the following: #!/bin/sh sudo /sbin/ifconfig $1 0.0.0.0 promisc up sudo /usr/sbin/brctl addif br0 $1 sleep 2 And here's the info on br0 and other interfaces: avl-host3 14# brctl show bridge name bridge id STP enabled interfaces br0 8000.180373f5521a no bond0 tap84 virbr0 8000.525400858961 yes virbr0-nic avl-host3 15# ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: em1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000 link/ether 18:03:73:f5:52:1a brd ff:ff:ff:ff:ff:ff 3: em2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000 link/ether 18:03:73:f5:52:1a brd ff:ff:ff:ff:ff:ff 4: em3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 18:03:73:f5:52:1e brd ff:ff:ff:ff:ff:ff 5: em4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 18:03:73:f5:52:20 brd ff:ff:ff:ff:ff:ff 6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 18:03:73:f5:52:1a brd ff:ff:ff:ff:ff:ff inet6 fe80::1a03:73ff:fef5:521a/64 scope link valid_lft forever preferred_lft forever 7: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN link/ether 18:03:73:f5:52:1a brd ff:ff:ff:ff:ff:ff inet 172.16.1.46/24 brd 172.16.1.255 scope global br0 inet6 fe80::1a03:73ff:fef5:521a/64 scope link valid_lft forever preferred_lft forever 8: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN link/ether 52:54:00:85:89:61 brd ff:ff:ff:ff:ff:ff inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0 9: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 500 link/ether 52:54:00:85:89:61 brd ff:ff:ff:ff:ff:ff 12: tap84: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500 link/ether ba:e8:9b:2a:ff:48 brd ff:ff:ff:ff:ff:ff inet6 fe80::b8e8:9bff:fe2a:ff48/64 scope link valid_lft forever preferred_lft forever bond0 is a bond of em1 and em2. virbr0 and virbr0-nic are vestigial interfaces left over from CentOS's default installation. They are unused (as far as I know). The guest runs perfectly until I run a large 'rsync', when the network will freeze after some seemingly-random time (usually under a minute). When it freezes, there is no network activity in or out of the guest. I can still connect to the guest's console via vnc, but it is unable to speak out its network interface. Any attempt to 'ping' from the guest gives a "Destination Host Unreachable" error for 3/4 packets and no reply for every fourth packet. Sometimes (perhaps two thirds of the time), I can bring the interface back to life by doing a "service network restart" from the guest's console. If this works (and if I do it before the rsync times out), the rsync will resume. Usually it will freeze again within a minute or two. If I repeat, the rsync will eventually finish, and I presume the machine goes back to waiting for another period of heavy load. Throughout the whole process, there are no console errors or relevant (that I can see) syslog messages on either guest or host machine. If the "service network restart" doesn't work the first time, trying again (and again and again) never seems to work. The command completes normally, with normal output, but the interface stays frozen. However, a soft reboot of the guest machine (without restarting qemu-kvm) always seems to bring it back. I am aware of the "lowest mac address" assignment problem, where the bridge takes on the mac address of the slave interface with the lowest mac address. This causes temporary network freezes, but is definitely not what's happening for me. My freezes are permanent until manual intervention, and you can see from the 'ip addr show' output above that the mac address being used by br0 is that of the physical ethernet. There are no other virtual machines running on the host. I've verified that each virtual machine on the subnet has its own unique mac address. I have rebuilt the guest machine several times, and I have tried this on three different host machines (identical hardware, built identically). Oddly, I do have one virtual host (the second of this series) which never seemed to have a problem. It never had its network freeze when it was running the same rsync during its build. It's particularly odd because it was the second build. The first, on a different host, did have the freezing problem, but the second did not. I assumed at the time that I had done something wrong with the first build, and that the problem was resolved. Unfortunately, the problem reappeared when I built the third VM. Also unfortunately, I can't do many tests with the working VM, as it's now in production use, and I'm hoping I can find the cause of this issue before that machine starts having problems. It's possible that I just got really lucky while running the rsync on the working machine, and that one time it didn't freeze. Of course it's possible that I somehow changed the build scripts without realizing it and re-broke something, but I can't find any such thing. In any case, I'm hoping someone has some idea what could cause this. Addendum: Preliminary tests suggest that I don't have the problem if I substitute e1000 for virtio in the first -net flag to qemu-kvm. I don't consider this a solution, but it is suitable for a stopgap. Has anyone else had (or better yet, solved) this problem with the virtio network driver?

Read the article
Cisco ASA: How to route PPPoE-assigned subnet?

- by Martijn Heemels

We've just received a fiber uplink, and I'm trying to configure our Cisco ASA 5505 to properly use it. The provider requires us to connect via PPPoE, and I managed to configure the ASA as a PPPoE client and establish a connection. The ASA is assigned an IP address by PPPoE, and I can ping out from the ASA to the internet, but I should have access to an entire /28 subnet. I can't figure out how to get that subnet configured on the ASA, so that I can route or NAT the available public addresses to various internal hosts. My assigned range is: 188.xx.xx.176/28 The address I get via PPPoE is 188.xx.xx.177/32, which according to our provider is our Default Gateway address. They claim the subnet is correctly routed to us on their side. How does the ASA know which range it is responsible for on the Fiber interface? How do I use the addresses from my range? To clarify my config; The ASA is currently configured to default-route to our ADSL uplink on port Ethernet0/0 (interface vlan2, nicknamed Outside). The fiber is connected to port Ethernet0/2 (interface vlan50, nicknamed Fiber) so I can configure and test it before making it the default route. Once I'm clear on how to set it all up, I'll fully replace the Outside interface with Fiber. My config (rather long): : Saved : ASA Version 8.3(2)4 ! hostname gw domain-name example.com enable password ****** encrypted passwd ****** encrypted names name 10.10.1.0 Inside-dhcp-network description Desktops and clients that receive their IP via DHCP name 10.10.0.208 svn.example.com description Subversion server name 10.10.0.205 marvin.example.com description LAMP development server name 10.10.0.206 dns.example.com description DNS, DHCP, NTP ! interface Vlan2 description Old ADSL WAN connection nameif outside security-level 0 ip address 192.168.1.2 255.255.255.252 ! interface Vlan10 description LAN vlan 10 Regular LAN traffic nameif inside security-level 100 ip address 10.10.0.254 255.255.0.0 ! interface Vlan11 description LAN vlan 11 Lab/test traffic nameif lab security-level 90 ip address 10.11.0.254 255.255.0.0 ! interface Vlan20 description LAN vlan 20 ISCSI traffic nameif iscsi security-level 100 ip address 10.20.0.254 255.255.0.0 ! interface Vlan30 description LAN vlan 30 DMZ traffic nameif dmz security-level 50 ip address 10.30.0.254 255.255.0.0 ! interface Vlan40 description LAN vlan 40 Guests access to the internet nameif guests security-level 50 ip address 10.40.0.254 255.255.0.0 ! interface Vlan50 description New WAN Corporate Internet over fiber nameif fiber security-level 0 pppoe client vpdn group KPN ip address pppoe ! interface Ethernet0/0 switchport access vlan 2 speed 100 duplex full ! interface Ethernet0/1 switchport trunk allowed vlan 10,11,30,40 switchport trunk native vlan 10 switchport mode trunk ! interface Ethernet0/2 switchport access vlan 50 speed 100 duplex full ! interface Ethernet0/3 shutdown ! interface Ethernet0/4 shutdown ! interface Ethernet0/5 switchport access vlan 20 ! interface Ethernet0/6 shutdown ! interface Ethernet0/7 shutdown ! boot system disk0:/asa832-4-k8.bin ftp mode passive clock timezone CEST 1 clock summer-time CEDT recurring last Sun Mar 2:00 last Sun Oct 3:00 dns domain-lookup inside dns server-group DefaultDNS name-server dns.example.com domain-name example.com same-security-traffic permit inter-interface same-security-traffic permit intra-interface object network inside-net subnet 10.10.0.0 255.255.0.0 object network svn.example.com host 10.10.0.208 object network marvin.example.com host 10.10.0.205 object network lab-net subnet 10.11.0.0 255.255.0.0 object network dmz-net subnet 10.30.0.0 255.255.0.0 object network guests-net subnet 10.40.0.0 255.255.0.0 object network dhcp-subnet subnet 10.10.1.0 255.255.255.0 description DHCP assigned addresses on Vlan 10 object network Inside-vpnpool description Pool of assignable addresses for VPN clients object network vpn-subnet subnet 10.10.3.0 255.255.255.0 description Address pool assignable to VPN clients object network dns.example.com host 10.10.0.206 description DNS, DHCP, NTP object-group service iscsi tcp description iscsi storage traffic port-object eq 3260 access-list outside_access_in remark Allow access from outside to HTTP on svn. access-list outside_access_in extended permit tcp any object svn.example.com eq www access-list Insiders!_splitTunnelAcl standard permit 10.10.0.0 255.255.0.0 access-list iscsi_access_in remark Prevent disruption of iscsi traffic from outside the iscsi vlan. access-list iscsi_access_in extended deny tcp any interface iscsi object-group iscsi log warnings ! snmp-map DenyV1 deny version 1 ! pager lines 24 logging enable logging timestamp logging asdm-buffer-size 512 logging monitor warnings logging buffered warnings logging history critical logging asdm errors logging flash-bufferwrap logging flash-minimum-free 4000 logging flash-maximum-allocation 2000 mtu outside 1500 mtu inside 1500 mtu lab 1500 mtu iscsi 9000 mtu dmz 1500 mtu guests 1500 mtu fiber 1492 ip local pool DHCP_VPN 10.10.3.1-10.10.3.20 mask 255.255.0.0 ip verify reverse-path interface outside no failover icmp unreachable rate-limit 10 burst-size 5 asdm image disk0:/asdm-635.bin asdm history enable arp timeout 14400 nat (inside,outside) source static any any destination static vpn-subnet vpn-subnet ! object network inside-net nat (inside,outside) dynamic interface object network svn.example.com nat (inside,outside) static interface service tcp www www object network lab-net nat (lab,outside) dynamic interface object network dmz-net nat (dmz,outside) dynamic interface object network guests-net nat (guests,outside) dynamic interface access-group outside_access_in in interface outside access-group iscsi_access_in in interface iscsi route outside 0.0.0.0 0.0.0.0 192.168.1.1 1 timeout xlate 3:00:00 timeout conn 1:00:00 half-closed 0:10:00 udp 0:02:00 icmp 0:00:02 timeout sunrpc 0:10:00 h323 0:05:00 h225 1:00:00 mgcp 0:05:00 mgcp-pat 0:05:00 timeout sip 0:30:00 sip_media 0:02:00 sip-invite 0:03:00 sip-disconnect 0:02:00 timeout sip-provisional-media 0:02:00 uauth 0:05:00 absolute timeout tcp-proxy-reassembly 0:01:00 dynamic-access-policy-record DfltAccessPolicy aaa-server SBS2003 protocol radius aaa-server SBS2003 (inside) host 10.10.0.204 timeout 5 key ***** aaa authentication enable console SBS2003 LOCAL aaa authentication ssh console SBS2003 LOCAL aaa authentication telnet console SBS2003 LOCAL http server enable http 10.10.0.0 255.255.0.0 inside snmp-server host inside 10.10.0.207 community ***** version 2c snmp-server location Server room snmp-server contact [email protected] snmp-server community ***** snmp-server enable traps snmp authentication linkup linkdown coldstart snmp-server enable traps syslog crypto ipsec transform-set TRANS_ESP_AES-256_SHA esp-aes-256 esp-sha-hmac crypto ipsec transform-set TRANS_ESP_AES-256_SHA mode transport crypto ipsec transform-set ESP-AES-256-MD5 esp-aes-256 esp-md5-hmac crypto ipsec transform-set ESP-DES-SHA esp-des esp-sha-hmac crypto ipsec transform-set ESP-DES-MD5 esp-des esp-md5-hmac crypto ipsec transform-set ESP-AES-192-MD5 esp-aes-192 esp-md5-hmac crypto ipsec transform-set ESP-3DES-MD5 esp-3des esp-md5-hmac crypto ipsec transform-set ESP-AES-256-SHA esp-aes-256 esp-sha-hmac crypto ipsec transform-set ESP-AES-128-SHA esp-aes esp-sha-hmac crypto ipsec transform-set ESP-AES-192-SHA esp-aes-192 esp-sha-hmac crypto ipsec transform-set ESP-AES-128-MD5 esp-aes esp-md5-hmac crypto ipsec transform-set ESP-3DES-SHA esp-3des esp-sha-hmac crypto ipsec security-association lifetime seconds 28800 crypto ipsec security-association lifetime kilobytes 4608000 crypto dynamic-map outside_dyn_map 20 set pfs group5 crypto dynamic-map outside_dyn_map 20 set transform-set TRANS_ESP_AES-256_SHA crypto dynamic-map SYSTEM_DEFAULT_CRYPTO_MAP 65535 set transform-set ESP-AES-128-SHA ESP-AES-128-MD5 ESP-AES-192-SHA ESP-AES-192-MD5 ESP-AES-256-SHA ESP-AES-256-MD5 ESP-3DES-SHA ESP-3DES-MD5 ESP-DES-SHA ESP-DES-MD5 crypto map outside_map 65535 ipsec-isakmp dynamic SYSTEM_DEFAULT_CRYPTO_MAP crypto map outside_map interface outside crypto isakmp enable outside crypto isakmp policy 1 authentication pre-share encryption 3des hash sha group 2 lifetime 86400 telnet 10.10.0.0 255.255.0.0 inside telnet timeout 5 ssh scopy enable ssh 10.10.0.0 255.255.0.0 inside ssh timeout 5 ssh version 2 console timeout 30 management-access inside vpdn group KPN request dialout pppoe vpdn group KPN localname INSIDERS vpdn group KPN ppp authentication pap vpdn username INSIDERS password ***** store-local dhcpd address 10.40.1.0-10.40.1.100 guests dhcpd dns 8.8.8.8 8.8.4.4 interface guests dhcpd update dns interface guests dhcpd enable guests ! threat-detection basic-threat threat-detection scanning-threat threat-detection statistics host number-of-rate 2 threat-detection statistics port number-of-rate 3 threat-detection statistics protocol number-of-rate 3 threat-detection statistics access-list threat-detection statistics tcp-intercept rate-interval 30 burst-rate 400 average-rate 200 ntp server dns.example.com source inside prefer webvpn group-policy DfltGrpPolicy attributes vpn-tunnel-protocol IPSec l2tp-ipsec group-policy Insiders! internal group-policy Insiders! attributes wins-server value 10.10.0.205 dns-server value 10.10.0.206 vpn-tunnel-protocol IPSec l2tp-ipsec split-tunnel-policy tunnelspecified split-tunnel-network-list value Insiders!_splitTunnelAcl default-domain value example.com username martijn password ****** encrypted privilege 15 username marcel password ****** encrypted privilege 15 tunnel-group DefaultRAGroup ipsec-attributes pre-shared-key ***** tunnel-group Insiders! type remote-access tunnel-group Insiders! general-attributes address-pool DHCP_VPN authentication-server-group SBS2003 LOCAL default-group-policy Insiders! tunnel-group Insiders! ipsec-attributes pre-shared-key ***** ! class-map global-class match default-inspection-traffic class-map type inspect http match-all asdm_medium_security_methods match not request method head match not request method post match not request method get ! ! policy-map type inspect dns preset_dns_map parameters message-length maximum 512 policy-map type inspect http http_inspection_policy parameters protocol-violation action drop-connection policy-map global-policy class global-class inspect dns inspect esmtp inspect ftp inspect h323 h225 inspect h323 ras inspect http inspect icmp inspect icmp error inspect mgcp inspect netbios inspect pptp inspect rtsp inspect snmp DenyV1 ! service-policy global-policy global smtp-server 123.123.123.123 prompt hostname context call-home profile CiscoTAC-1 no active destination address http https://tools.cisco.com/its/service/oddce/services/DDCEService destination address email [email protected] destination transport-method http subscribe-to-alert-group diagnostic subscribe-to-alert-group environment subscribe-to-alert-group inventory periodic monthly subscribe-to-alert-group configuration periodic monthly subscribe-to-alert-group telemetry periodic daily hpm topN enable Cryptochecksum:a76bbcf8b19019771c6d3eeecb95c1ca : end asdm image disk0:/asdm-635.bin asdm location svn.example.com 255.255.255.255 inside asdm location marvin.example.com 255.255.255.255 inside asdm location dns.example.com 255.255.255.255 inside asdm history enable

Read the article
mysqld crashes on any statement

- by ??iu

I restarted my slave to change configuration settings to skip reverse hostname lookup on connecting and to enable the slow query log. I edited /etc/my.cnf making only these changes, then restarted mysqld with /etc/init.d/mysql restart All appeared to be well but when I connect to msyqld remotely or locally though it connects okay a slight problem is that mysqld crashes whenever you try to issue any kind of statement. The client looks like: Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.1.31-1ubuntu2-log Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> show tables; ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 1 Current database: mydb ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... ERROR 2003 (HY000): Can't connect to MySQL server on 'xx.xx.xx.xx' (61) ERROR: Can't connect to the server ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... ERROR 2003 (HY000): Can't connect to MySQL server on 'xx.xx.xx.xx' (61) ERROR: Can't connect to the server ERROR 2006 (HY000): MySQL server has gone away Bus error The mysqld error log looks like: 101210 16:35:51 InnoDB: Error: (1500) Couldn't read the MAX(job_id) autoinc value from the index (PRIMARY). 101210 16:35:51 InnoDB: Assertion failure in thread 140245598570832 in file handler/ha_innodb.cc line 2595 InnoDB: Failing assertion: error == DB_SUCCESS InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html InnoDB: about forcing recovery. 101210 16:35:51 - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=16777216 read_buffer_size=131072 max_used_connections=3 max_threads=600 threads_connected=3 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1328077 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x18209220 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f8d791580d0 thread_stack 0x20000 /usr/sbin/mysqld(my_print_stacktrace+0x29) [0x8b4f89] /usr/sbin/mysqld(handle_segfault+0x383) [0x5f8f03] /lib/libpthread.so.0 [0x7f902a76a080] /lib/libc.so.6(gsignal+0x35) [0x7f90291f8fb5] /lib/libc.so.6(abort+0x183) [0x7f90291fabc3] /usr/sbin/mysqld(ha_innobase::open(char const*, int, unsigned int)+0x41b) [0x781f4b] /usr/sbin/mysqld(handler::ha_open(st_table*, char const*, int, int)+0x3f) [0x6db00f] /usr/sbin/mysqld(open_table_from_share(THD*, st_table_share*, char const*, unsigned int, unsigned int, unsigned int, st_table*, bool)+0x57a) [0x64760a] /usr/sbin/mysqld [0x63f281] /usr/sbin/mysqld(open_table(THD*, TABLE_LIST*, st_mem_root*, bool*, unsigned int)+0x626) [0x641e16] /usr/sbin/mysqld(open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int)+0x5db) [0x6429cb] /usr/sbin/mysqld(open_normal_and_derived_tables(THD*, TABLE_LIST*, unsigned int)+0x1e) [0x642b0e] /usr/sbin/mysqld(mysqld_list_fields(THD*, TABLE_LIST*, char const*)+0x22) [0x70b292] /usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x146d) [0x60dc1d] /usr/sbin/mysqld(do_command(THD*)+0xe8) [0x60dda8] /usr/sbin/mysqld(handle_one_connection+0x226) [0x601426] /lib/libpthread.so.0 [0x7f902a7623ba] /lib/libc.so.6(clone+0x6d) [0x7f90292abfcd] Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x18213c70 = thd->thread_id=3 thd->killed=NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 101210 16:35:51 mysqld_safe Number of processes running now: 0 101210 16:35:51 mysqld_safe mysqld restarted InnoDB: The log sequence number in ibdata files does not match InnoDB: the log sequence number in the ib_logfiles! 101210 16:35:54 InnoDB: Database was not shut down normally! InnoDB: Starting crash recovery. InnoDB: Reading tablespace information from the .ibd files... InnoDB: Restoring possible half-written data pages from the doublewrite InnoDB: buffer... 101210 16:35:56 InnoDB: Started; log sequence number 456 143528628 101210 16:35:56 [Warning] 'user' entry 'root@PSDB102' ignored in --skip-name-resolve mode. 101210 16:35:56 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysqld-relay-bin' to avoid this problem. 101210 16:35:56 [Note] Event Scheduler: Loaded 0 events 101210 16:35:56 [Note] /usr/sbin/mysqld: ready for connections. Version: '5.1.31-1ubuntu2-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 (Ubuntu) 101210 16:36:11 InnoDB: Error: (1500) Couldn't read the MAX(job_id) autoinc value from the index (PRIMARY). 101210 16:36:11 InnoDB: Assertion failure in thread 139955151501648 in file handler/ha_innodb.cc line 2595 InnoDB: Failing assertion: error == DB_SUCCESS InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html InnoDB: about forcing recovery. 101210 16:36:11 - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=16777216 read_buffer_size=131072 max_used_connections=1 max_threads=600 threads_connected=1 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1328077 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x18588720 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f49d916f0d0 thread_stack 0x20000 /usr/sbin/mysqld(my_print_stacktrace+0x29) [0x8b4f89] /usr/sbin/mysqld(handle_segfault+0x383) [0x5f8f03] /lib/libpthread.so.0 [0x7f4c8a73f080] /lib/libc.so.6(gsignal+0x35) [0x7f4c891cdfb5] /lib/libc.so.6(abort+0x183) [0x7f4c891cfbc3] /usr/sbin/mysqld(ha_innobase::open(char const*, int, unsigned int)+0x41b) [0x781f4b] /usr/sbin/mysqld(handler::ha_open(st_table*, char const*, int, int)+0x3f) [0x6db00f] /usr/sbin/mysqld(open_table_from_share(THD*, st_table_share*, char const*, unsigned int, unsigned int, unsigned int, st_table*, bool)+0x57a) [0x64760a] /usr/sbin/mysqld [0x63f281] /usr/sbin/mysqld(open_table(THD*, TABLE_LIST*, st_mem_root*, bool*, unsigned int)+0x626) [0x641e16] /usr/sbin/mysqld(open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int)+0x5db) [0x6429cb] /usr/sbin/mysqld(open_normal_and_derived_tables(THD*, TABLE_LIST*, unsigned int)+0x1e) [0x642b0e] /usr/sbin/mysqld(mysqld_list_fields(THD*, TABLE_LIST*, char const*)+0x22) [0x70b292] /usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x146d) [0x60dc1d] /usr/sbin/mysqld(do_command(THD*)+0xe8) [0x60dda8] /usr/sbin/mysqld(handle_one_connection+0x226) [0x601426] /lib/libpthread.so.0 [0x7f4c8a7373ba] /lib/libc.so.6(clone+0x6d) [0x7f4c89280fcd] Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x18599950 = thd->thread_id=1 thd->killed=NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 101210 16:36:11 mysqld_safe Number of processes running now: 0 101210 16:36:11 mysqld_safe mysqld restarted The config is [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] innodb_file_per_table innodb_buffer_pool_size=10G innodb_log_buffer_size=4M innodb_flush_log_at_trx_commit=2 innodb_thread_concurrency=8 skip-slave-start server-id=3 # # * IMPORTANT # If you make changes to these settings and your system uses apparmor, you may # also need to also adjust /etc/apparmor.d/usr.sbin.mysqld. # user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /DB2/mysql tmpdir = /tmp skip-external-locking # # Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. #bind-address = 127.0.0.1 # # * Fine Tuning # key_buffer = 16M max_allowed_packet = 16M thread_stack = 128K thread_cache_size = 8 # This replaces the startup script and checks MyISAM tables if needed # the first time they are touched myisam-recover = BACKUP max_connections = 600 #table_cache = 64 #thread_concurrency = 10 # # * Query Cache Configuration # query_cache_limit = 1M query_cache_size = 32M # skip-federated slow-query-log skip-name-resolve Update: I followed the instructions as per http://dev.mysql.com/doc/refman/5.1/en/forcing-innodb-recovery.html and set innodb_force_recovery = 4 and the logs are showing a different error but the behavior is still the same: 101210 19:14:15 mysqld_safe mysqld restarted 101210 19:14:19 InnoDB: Started; log sequence number 456 143528628 InnoDB: !!! innodb_force_recovery is set to 4 !!! 101210 19:14:19 [Warning] 'user' entry 'root@PSDB102' ignored in --skip-name-resolve mode. 101210 19:14:19 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysqld-relay-bin' to avoid this problem. 101210 19:14:19 [Note] Event Scheduler: Loaded 0 events 101210 19:14:19 [Note] /usr/sbin/mysqld: ready for connections. Version: '5.1.31-1ubuntu2-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 (Ubuntu) 101210 19:14:32 InnoDB: error: space object of table mydb/__twitter_friend, InnoDB: space id 1602 did not exist in memory. Retrying an open. 101210 19:14:32 InnoDB: error: space object of table mydb/access_request, InnoDB: space id 1318 did not exist in memory. Retrying an open. 101210 19:14:32 InnoDB: error: space object of table mydb/activity, InnoDB: space id 1595 did not exist in memory. Retrying an open. 101210 19:14:32 - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=16777216 read_buffer_size=131072 max_used_connections=1 max_threads=600 threads_connected=1 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1328077 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x1753c070 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f7a0b5800d0 thread_stack 0x20000 /usr/sbin/mysqld(my_print_stacktrace+0x29) [0x8b4f89] /usr/sbin/mysqld(handle_segfault+0x383) [0x5f8f03] /lib/libpthread.so.0 [0x7f7cbc350080] /usr/sbin/mysqld(ha_innobase::innobase_get_index(unsigned int)+0x46) [0x77c516] /usr/sbin/mysqld(ha_innobase::innobase_initialize_autoinc()+0x40) [0x77c640] /usr/sbin/mysqld(ha_innobase::open(char const*, int, unsigned int)+0x3f3) [0x781f23] /usr/sbin/mysqld(handler::ha_open(st_table*, char const*, int, int)+0x3f) [0x6db00f] /usr/sbin/mysqld(open_table_from_share(THD*, st_table_share*, char const*, unsigned int, unsigned int, unsigned int, st_table*, bool)+0x57a) [0x64760a] /usr/sbin/mysqld [0x63f281] /usr/sbin/mysqld(open_table(THD*, TABLE_LIST*, st_mem_root*, bool*, unsigned int)+0x626) [0x641e16] /usr/sbin/mysqld(open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int)+0x5db) [0x6429cb] /usr/sbin/mysqld(open_normal_and_derived_tables(THD*, TABLE_LIST*, unsigned int)+0x1e) [0x642b0e] /usr/sbin/mysqld(mysqld_list_fields(THD*, TABLE_LIST*, char const*)+0x22) [0x70b292] /usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x146d) [0x60dc1d] /usr/sbin/mysqld(do_command(THD*)+0xe8) [0x60dda8] /usr/sbin/mysqld(handle_one_connection+0x226) [0x601426] /lib/libpthread.so.0 [0x7f7cbc3483ba] /lib/libc.so.6(clone+0x6d) [0x7f7cbae91fcd] Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x1754d690 = thd->thread_id=1 thd->killed=NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash.

Read the article
Oracle Support Master Note for Troubleshooting Advanced Queuing and Oracle Streams Propagation Issues (Doc ID 233099.1)

- by faye.todd(at)oracle.com

Master Note for Troubleshooting Advanced Queuing and Oracle Streams Propagation Issues (Doc ID 233099.1) Copyright (c) 2010, Oracle Corporation. All Rights Reserved. In this Document Purpose Last Review Date Instructions for the Reader Troubleshooting Details 1. Scope and Application 2. Definitions and Classifications 3. How to Use This Guide 4. Basic AQ Propagation Troubleshooting 5. Additional Troubleshooting Steps for AQ Propagation of User-Enqueued and Dequeued Messages 6. Additional Troubleshooting Steps for Propagation in an Oracle Streams Environment 7. Performance Issues References Applies to: Oracle Server - Enterprise Edition - Version: 8.1.7.0 to 11.2.0.2 - Release: 8.1.7 to 11.2Information in this document applies to any platform. Purpose This document presents a step-by-step methodology for troubleshooting and resolving problems with Advanced Queuing Propagation in both Streams and basic Advanced Queuing environments. It also serves as a master reference for other more specific notes on Oracle Streams Propagation and Advanced Queuing Propagation issues. Last Review Date December 20, 2010 Instructions for the Reader A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting. Troubleshooting Details 1. Scope and Application This note is intended for Database Administrators of Oracle databases where issues are being encountered with propagating messages between advanced queues, whether the queues are used for user-created messaging systems or for Oracle Streams. It contains troubleshooting steps and links to notes for further problem resolution.It can also be used a template to document a problem when it is necessary to engage Oracle Support Services. Knowing what is NOT happening can frequently speed up the resolution process by focusing solely on the pertinent problem area. This guide is divided into five parts: Section 2: Definitions and Classifications (discusses the different types and features of propagations possible - helpful for understanding the rest of the guide) Section 3: How to Use this Guide (to be used as a start part for determining the scope of the problem and what sections to consult) Section 4. Basic AQ propagation troubleshooting (applies to both AQ propagation of user enqueued and dequeued messages as well as Oracle Streams propagations) Section 5. Additional troubleshooting steps for AQ propagation of user enqueued and dequeued messages Section 6. Additional troubleshooting steps for Oracle Streams propagation Section 7. Performance issues 2. Definitions and Classifications Given the potential scope of issues that can be encountered with AQ propagation, the first recommended step is to do some basic diagnosis to determine the type of problem that is being encountered. 2.1. What Type of Propagation is Being Used? 2.1.1. Buffered Messaging For an advanced queue, messages can be maintained on disk (persistent messaging) or in memory (buffered messaging). To determine if a queue is buffered or not, reference the GV_$BUFFERED_QUEUES view. If the queue does not appear in this view, it is persistent. 2.1.2. Propagation mode - queue-to-dblink vs queue-to-queue As of 10.2, an AQ propagation can also be defined as queue-to-dblink, or queue-to-queue: queue-to-dblink: The propagation delivers messages or events from the source queue to all subscribing queues at the destination database identified by the dblink. A single propagation schedule is used to propagate messages to all subscribing queues. Hence any changes made to this schedule will affect message delivery to all the subscribing queues. This mode does not support multiple propagations from the same source queue to the same target database. queue-to-queue: Added in 10.2, this propagation mode delivers messages or events from the source queue to a specific destination queue identified on the database link. This allows the user to have fine-grained control on the propagation schedule for message delivery. This new propagation mode also supports transparent failover when propagating to a destination Oracle RAC system. With queue-to-queue propagation, you are no longer required to re-point a database link if the owner instance of the queue fails on Oracle RAC. This mode supports multiple propagations to the same target database if the target queues are different. The default is queue-to-dblink. To verify if queue-to-queue propagation is being used, in non-Streams environments query DBA_QUEUE_SCHEDULES.DESTINATION - if a remote queue is listed along with the remote database link, then queue-to-queue propagation is being used. For Streams environments, the DBA_PROPAGATION.QUEUE_TO_QUEUE column can be checked.See the following note for a method to switch between the two modes:Document 827473.1 How to alter propagation from queue-to-queue to queue-to-dblink 2.1.3. Combined Capture and Apply (CCA) for Streams In 11g Oracle Streams environments, an optimization called Combined Capture and Apply (CCA) is implemented by default when possible. Although a propagation is configured in this case, Streams does not use it; instead it passes information directly from capture to an apply receiver. To see if CCA is in use: COLUMN CAPTURE_NAME HEADING 'Capture Name' FORMAT A30COLUMN OPTIMIZATION HEADING 'CCA Mode?' FORMAT A10SELECT CAPTURE_NAME, DECODE(OPTIMIZATION,0, 'No','Yes') OPTIMIZATIONFROM V$STREAMS_CAPTURE; Also, see the following note:Document 463820.1 Streams Combined Capture and Apply in 11g 2.2. Queue Table Compatibility There are three types of queue table compatibility. In more recent databases, queue tables may be present in all three modes of compatibility: 8.0 - earliest version, deprecated in 10.2 onwards 8.1 - support added for RAC, asynchronous notification, secure queues, queue level access control, rule-based subscribers, separate storage of history information 10.0 - if the database is in 10.1-compatible mode, then the default value for queue table compatibility is 10.0 2.3. Single vs Multiple Consumer Queue Tables If more than one recipient can dequeue a message from a queue, then its queue table is multiple consumer. You can propagate messages from a multiple-consumer queue to a single-consumer queue. Propagation from a single-consumer queue to a multiple-consumer queue is not possible. 3. How to Use This Guide 3.1. Are Messages Being Propagated at All, or is the Propagation Just Slow? Run the following query on the source database for the propagation (assuming that it is running): select TOTAL_NUMBER from DBA_QUEUE_SCHEDULES where QNAME='<source_queue_name>'; If TOTAL_NUMBER is increasing, then propagation is most likely functioning, although it may be slow. For performance issues, see Section 7. 3.2. Propagation Between Persistent User-Created Queues See Sections 4 and 5 (and optionally Section 6 if performance is an issue). 3.3. Propagation Between Buffered User-Created Queues See Sections 4, 5, and 6 (and optionally Section 7 if performance is an issue). 3.4. Propagation between Oracle Streams Queues (without Combined Capture and Apply (CCA) Optimization) See Sections 4 and 6 (and optionally Section 7 if performance is an issue). 3.5. Propagation between Oracle Streams Queues (with Combined Capture and Apply (CCA) Optimization) Although an AQ propagation is not used directly in this case, some characteristics of the message transfer are inferred from the propagation parameters used. Some parts of Sections 4 and 6 still apply. 3.6. Messaging Gateway Propagations This note does not apply to Messaging Gateway propagations. 4. Basic AQ Propagation Troubleshooting 4.1. Double-check Your Code Make sure that you are consistent in your usage of the database link(s) names, queue names, etc. It may be useful to plot a diagram of which queues are connected via which database links to make sure that the logical structure is correct. 4.2. Verify that Job Queue Processes are Running 4.2.1. Versions 10.2 and Lower - DBA_JOBS Package For versions 10.2 and lower, a scheduled propagation is managed by DBMS_JOB package. The propagation is performed by job queue process background processes. Therefore we need to verify that there are sufficient processes available for the propagation process. We should have at least 4 job queue processes running and preferably more depending on the number of other jobs running in the database. It should be noted that for AQ specific work, AQ will only ever use half of the job queue processes available.An issue caused by an inadequate job queue processes parameter setting is described in the following note:Document 298015.1 Kwqjswproc:Excep After Loop: Assigning To Self 4.2.1.1. Job Queue Processes in Initalization Parameter File The parameter JOB_QUEUE_PROCESSES in the init.ora/spfile should be > 0. The value can be changed dynamically via connect / as sysdbaalter system set JOB_QUEUE_PROCESSES=10; 4.2.1.2. Job Queue Processes in Memory The following command will show how many job queue processes are currentlyin use by this instance (this may be different than what is in the init.ora/spfile): connect / as sysdbashow parameter job; 4.2.1.3. OS PIDs Corresponding to Job Queue Processes Identify the operating system process ids (spids) of job queue processes involved in propagation via select p.SPID, p.PROGRAM from V$PROCESS p, DBA_JOBS_RUNNING jr, V$SESSION s, DBA_JOBS j where s.SID=jr.SID and s.PADDR=p.ADDR and jr.JOB=j.JOBand j.WHAT like '%sys.dbms_aqadm.aq$_propaq(job)%'; and these SPIDs can be used to check at the operating system level that they exist.In 8i a job queue process will have a name similar to: ora_snp1_<instance_name>.In 9i onwards you will see a coordinator process: ora_cjq0_ and multiple slave processes: ora_jnnn_<instance_name>, where nnn is an integer between 1 and 999. 4.2.2. Version 11.1 and Above - Oracle Scheduler In version 11.1 and above, Oracle Scheduler is used to perform AQ and Streams propagations. Oracle Scheduler automatically tunes the number of slave processes for these jobs based on the load on the computer system, and the JOB_QUEUE_PROCESSES initialization parameter is only used to specify the maximum number of slave processes. Therefore, the JOB_QUEUE_PROCESSES initialization parameter does not need to be set (it defaults to a very high number), unless you want to limit the number of slaves that can be created. If JOB_QUEUE_PROCESSES = 0, no propagation jobs will run.See the following note for a discussion of Oracle Streams 11g and Oracle Scheduler:Document 1083608.1 11g Streams and Oracle Scheduler 4.2.2.1. Job Queue Processes in Initalization Parameter File The parameter JOB_QUEUE_PROCESSES in the init.ora/spfile should be > 0, and preferably be left at its default value. The value can be changed dynamically via connect / as sysdbaalter system set JOB_QUEUE_PROCESSES=10; To set the JOB_QUEUE_PROCESSES parameter to its default value, run: connect / as sysdbaalter system reset JOB_QUEUE_PROCESSES; and then bounce the instance. 4.2.2.2. Job Queue Processes in Memory The following command will show how many job queue processes are currently in use by this instance (this may be different than what is in the init.ora/spfile): connect / as sysdbashow parameter job; 4.2.2.3. OS PIDs Corresponding to Job Queue Processes Identify the operating system process ids (SPIDs) of job queue processes involved in propagation via col PROGRAM for a30select p.SPID, p.PROGRAM, j.JOB_namefrom v$PROCESS p, DBA_SCHEDULER_RUNNING_JOBS jr, V$SESSION s, DBA_SCHEDULER_JOBS j where s.SID=jr.SESSION_ID and s.PADDR=p.ADDRand jr.JOB_name=j.JOB_NAME and j.JOB_NAME like '%AQ_JOB$_%'; and these SPIDs can be used to check at the operating system level that they exist.You will see a coordinator process: ora_cjq0_ and multiple slave processes: ora_jnnn_<instance_name>, where nnn is an integer between 1 and 999. 4.3. Check the Alert Log and Any Associated Trace Files The first place to check for propagation failures is the alert logs at all sites (local and if relevant all remote sites). When a job queue process attempts to execute a schedule and fails it will always write an error stack to the alert log. This error stack will also be written in a job queue process trace file, which will be written to the BACKGROUND_DUMP_DEST location for 10.2 and below, and in the DIAGNOSTIC_DEST location for 11g. The fact that errors are written to the alert log demonstrates that the schedule is executing. This means that the problem could be with the set up of the schedule. In this example the ORA-02068 demonstrates that the failure was at the remote site. Further investigation revealed that the remote database was not open, hence the ORA-03114 error. Starting the database resolved the problem. Thu Feb 14 10:40:05 2002 Propagation Schedule for (AQADM.MULTIPLEQ, SHANE816.WORLD) encountered following error:ORA-04052: error occurred when looking up Remote object [email protected]: error occurred at recursive SQL level 4ORA-02068: following severe error from SHANE816ORA-03114: not connected to ORACLEORA-06512: at "SYS.DBMS_AQADM_SYS", line 4770ORA-06512: at "SYS.DBMS_AQADM", line 548ORA-06512: at line 1 Other potential errors that may be written to the alert log can be found in the following notes:Document 827184.1 AQ Propagation with CLOB data types Fails with ORA-22990 (11.1)Document 846297.1 AQ Propagation Fails : ORA-00600[kope2upic2954] or Ora-00600[Kghsstream_copyn] (10.2, 11.1)Document 731292.1 ORA-25215 Reported on Local Propagation When Using Transformation with ANYDATA queue tables (10.2, 11.1, 11.2)Document 365093.1 ORA-07445 [kwqppay2aqe()+7360] Reported on Propagation of a Transformed Message (10.1, 10.2)Document 219416.1 Advanced Queuing Propagation Fails with ORA-22922 (9.0)Document 1203544.1 AQ Propagation Aborted with ORA-600 [ociksin: invalid status] on SYS.DBMS_AQADM_SYS.AQ$_PROPAGATION_PROCEDURE After Upgrade (11.1, 11.2)Document 1087324.1 ORA-01405 ORA-01422 reported by Advanced Queuing Propagation schedules after RAC reconfiguration (10.2)Document 1079577.1 Advanced Queuing Propagation Fails With "ORA-22370 incorrect usage of method" (9.2, 10.2, 11.1, 11.2)Document 332792.1 ORA-04061 error relating to SYS.DBMS_PRVTAQIP reported when setting up Statspack (8.1, 9.0, 9.2, 10.1)Document 353325.1 ORA-24056: Internal inconsistency for QUEUE <queue_name> and destination <dblink> (8.1, 9.0, 9.2, 10.1, 10.2, 11.1, 11.2)Document 787367.1 ORA-22275 reported on Propagating Messages with LOB component when propagating between 10.1 and 10.2 (10.1, 10.2)Document 566622.1 ORA-22275 when propagating >4K AQ$_JMS_TEXT_MESSAGEs from 9.2.0.8 to 10.2.0.1 (9.2, 10.1)Document 731539.1 ORA-29268: HTTP client error 401 Unauthorized Error when the AQ Servlet attempts to Propagate a message via HTTP (9.0, 9.2, 10.1, 10.2, 11.1)Document 253131.1 Concurrent Writes May Corrupt LOB Segment When Using Auto Segment Space Management (ORA-1555) (9.2)Document 118884.1 How to unschedule a propagation schedule stuck in pending stateDocument 222992.1 DBMS_AQADM.DISABLE_PROPAGATION_SCHEDULE Returns ORA-24082Document 282987.1 Propagated Messages marked UNDELIVERABLE after Drop and Recreate Of Remote QueueDocument 1204080.1 AQ Propagation Failing With ORA-25329 After Upgraded From 8i or 9i to 10g or 11g.Document 1233675.1 AQ Propagation stops after upgrade to 11.2.0.1 ORA-30757 4.3.1. Errors Related to Incorrect Network Configuration The most common propagation errors result from an incorrect network configuration. The list below contains common errors caused by tnsnames.ora file or database links being configured incorrectly: - ORA-12154: TNS:could not resolve service name- ORA-12505: TNS:listener does not currently know of SID given in connect descriptor- ORA-12514: TNS:listener could not resolve SERVICE_NAME - ORA-12541: TNS-12541 TNS:no listener 4.4. Check the Database Links Exist and are Functioning Correctly For schedules to remote databases confirm the database link exists via. SQL> col DBLINK for a45SQL> select QNAME, NVL(REGEXP_SUBSTR(DESTINATION, '[^@]+', 1, 2), DESTINATION) dblink2 from DBA_QUEUE_SCHEDULES3 where MESSAGE_DELIVERY_MODE = 'PERSISTENT';QNAME DBLINK------------------------------ ---------------------------------------------MY_QUEUE ORCL102B.WORLD Connect as the owner of the link and select across it to verify it works and connects to the database we expect. i.e. select * from ALL_QUEUES@ ORCL102B.WORLD; You need to ensure that the userid that scheduled the propagation (using DBMS_AQADM.SCHEDULE_PROPAGATION or DBMS_PROPAGATION_ADM.CREATE_PROPAGATION if using Streams) has access to the database link for the destination. 4.5. Has Propagation Been Correctly Scheduled? Check that the propagation schedule has been created and that a job queue process has been assigned. Look for the entry in DBA_QUEUE_SCHEDULES and SYS.AQ$_SCHEDULES for your schedule. For 10g and below, check that it has a JOBNO entry in SYS.AQ$_SCHEDULES, and that there is an entry in DBA_JOBS with that JOBNO. For 11g and above, check that the schedule has a JOB_NAME entry in SYS.AQ$_SCHEDULES, and that there is an entry in DBA_SCHEDULER_JOBS with that JOB_NAME. Check the destination is as intended and spelled correctly. SQL> select SCHEMA, QNAME, DESTINATION, SCHEDULE_DISABLED, PROCESS_NAME from DBA_QUEUE_SCHEDULES;SCHEMA QNAME DESTINATION S PROCESS------- ---------- ------------------ - -----------AQADM MULTIPLEQ AQ$_LOCAL N J000 AQ$_LOCAL in the destination column shows that the queue to which we are propagating to is in the same database as the source queue. If the propagation was to a remote (different) database, a database link will be in the DESTINATION column. The entry in the SCHEDULE_DISABLED column, N, means that the schedule is NOT disabled. If Y (yes) appears in this column, propagation is disabled and the schedule will not be executed. If not using Oracle Streams, propagation should resume once you have enabled the schedule by invoking DBMS_AQADM.ENABLE_PROPAGATION_SCHEDULE (for 10.2 Oracle Streams and above, the DBMS_PROPAGATION_ADM.START_PROPAGATION procedure should be used). The PROCESS_NAME is the name of the job queue process currently allocated to execute the schedule. This process is allocated dynamically at execution time. If the PROCESS_NAME column is null (empty) the schedule is not currently executing. You may need to execute this statement a number of times to verify if a process is being allocated. If a process is at some time allocated to the schedule, it is attempting to execute. SQL> select SCHEMA, QNAME, LAST_RUN_DATE, NEXT_RUN_DATE from DBA_QUEUE_SCHEDULES;SCHEMA QNAME LAST_RUN_DATE NEXT_RUN_DATE------ ----- ----------------------- ----------------------- AQADM MULTIPLEQ 13-FEB-2002 13:18:57 13-FEB-2002 13:20:30 In 11g, these dates are expressed in TIMESTAMP WITH TIME ZONE datatypes. If the NEXT_RUN_DATE and NEXT_RUN_TIME columns are null when this statement is executed, the scheduled propagation is currently in progress. If they never change it would suggest that the schedule itself is never executing. If the next scheduled execution is too far away, change the NEXT_TIME parameter of the schedule so that schedules are executed more frequently (assuming that the window is not set to be infinite). Parameters of a schedule can be changed using the DBMS_AQADM.ALTER_PROPAGATION_SCHEDULE call. In 10g and below, scheduling propagation posts a job in the DBA_JOBS view. The columns are more or less the same as DBA_QUEUE_SCHEDULES so you just need to recognize the job and verify that it exists. SQL> select JOB, WHAT from DBA_JOBS where WHAT like '%sys.dbms_aqadm.aq$_propaq(job)%';JOB WHAT---- ----------------- 720 next_date := sys.dbms_aqadm.aq$_propaq(job); For 11g, scheduling propagation posts a job in DBA_SCHEDULER_JOBS instead: SQL> select JOB_NAME from DBA_SCHEDULER_JOBS where JOB_NAME like 'AQ_JOB$_%';JOB_NAME------------------------------AQ_JOB$_41 If no job exists, check DBA_QUEUE_SCHEDULES to make sure that the schedule has not been disabled. For 10g and below, the job number is dynamic for AQ propagation schedules. The procedure that is executed to expedite a propagation schedule runs, removes itself from DBA_JOBS, and then reposts a new job for the next scheduled propagation. The job number should therefore always increment unless the schedule has been set up to run indefinitely. 4.6. Is the Schedule Executing but Failing to Complete? Run the following query: SQL> select FAILURES, LAST_ERROR_MSG from DBA_QUEUE_SCHEDULES;FAILURES LAST_ERROR_MSG------------ -----------------------1 ORA-25207: enqueue failed, queue AQADM.INQ is disabled from enqueueingORA-02063: preceding line from SHANE816 The failures column shows how many times we have attempted to execute the schedule and failed. Oracle will attempt to execute the schedule 16 times after which it will be removed from the DBA_JOBS or DBA_SCHEDULER_JOBS view and the schedule will become disabled. The column DBA_QUEUE_SCHEDULES.SCHEDULE_DISABLED will show 'Y'. For 11g and above, the DBA_SCHEDULER_JOBS.STATE column will show 'BROKEN' for the job corresponding to DBA_QUEUE_SCHEDULES.JOB_NAME. Prior to 10g the back off algorithm for failures was exponential, whereas from 10g onwards it is linear. The propagation will become disabled on the 17th attempt. Only the last execution failure will be reflected in the LAST_ERROR_MSG column. That is, if the schedule fails 5 times for 5 different reasons, only the last set of errors will be recorded in DBA_QUEUE_SCHEDULES. Any errors need to be resolved to allow propagation to continue. If propagation has also become disabled due to 17 failures, first resolve the reason for the error and then re-enable the schedule using the DBMS_AQADM.ENABLE_PROPAGATION_SCHEDULE procedure, or DBMS_PROPAGATION_ADM.START_PROPAGATION if using 10.2 or above Oracle Streams. As soon as the schedule executes successfully the error message entries will be deleted. Oracle does not keep a history of past failures. However, when using Oracle Streams, the errors will be retained in the DBA_PROPAGATION view even after the schedule resumes successfully. See the following note for instructions on how to clear out the errors from the DBA_PROPAGATION view:Document 808136.1 How to clear the old errors from DBA_PROPAGATION view?If a schedule is active and no errors are being reported then the source queue may not have any messages to be propagated. 4.7. Do the Propagation Notification Queue Table and Queue Exist? Check to see that the propagation notification queue table and queue exist and are enabled for enqueue and dequeue. Propagation makes use of the propagation notification queue for handling propagation run-time events, and the messages in this queue are stored in a SYS-owned queue table. This queue should never be stopped or dropped and the corresponding queue table never be dropped. 10g and belowThe propagation notification queue table is of the format SYS.AQ$_PROP_TABLE_n, where 'n' is the RAC instance number, i.e. '1' for a non-RAC environment. This queue and queue table are created implicitly when propagation is first scheduled. If propagation has been scheduled and these objects do not exist, try unscheduling and rescheduling propagation. If they still do not exist contact Oracle Support. SQL> select QUEUE_TABLE from DBA_QUEUE_TABLES2 where QUEUE_TABLE like '%PROP_TABLE%' and OWNER = 'SYS';QUEUE_TABLE------------------------------AQ$_PROP_TABLE_1SQL> select NAME, ENQUEUE_ENABLED, DEQUEUE_ENABLED2 from DBA_QUEUES where owner='SYS'3 and QUEUE_TABLE like '%PROP_TABLE%';NAME ENQUEUE DEQUEUE------------------------------ ------- -------AQ$_PROP_NOTIFY_1 YES YESAQ$_AQ$_PROP_TABLE_1_E NO NO If the AQ$_PROP_NOTIFY_1 queue is not enabled for enqueue or dequeue, it should be so enabled using DBMS_AQADM.START_QUEUE. However, the exception queue AQ$_AQ$_PROP_TABLE_1_E should not be enabled for enqueue or dequeue.11g and aboveThe propagation notification queue table is of the format SYS.AQ_PROP_TABLE, and is created when the database is created. If they do not exist, contact Oracle Support. SQL> select QUEUE_TABLE from DBA_QUEUE_TABLES2 where QUEUE_TABLE like '%PROP_TABLE%' and OWNER = 'SYS';QUEUE_TABLE------------------------------AQ_PROP_TABLESQL> select NAME, ENQUEUE_ENABLED, DEQUEUE_ENABLED2 from DBA_QUEUES where owner='SYS'3 and QUEUE_TABLE like '%PROP_TABLE%';NAME ENQUEUE DEQUEUE------------------------------ ------- -------AQ_PROP_NOTIFY YES YESAQ$_AQ_PROP_TABLE_E NO NO If the AQ_PROP_NOTIFY queue is not enabled for enqueue or dequeue, it should be so enabled using DBMS_AQADM.START_QUEUE. However, the exception queue AQ$_AQ$_PROP_TABLE_E should not be enabled for enqueue or dequeue. 4.8. Does the Remote Queue Exist and is it Enabled for Enqueueing? Check that the remote queue the propagation is transferring messages to exists and is enabled for enqueue: SQL> select DESTINATION from USER_QUEUE_SCHEDULES where QNAME = 'OUTQ';DESTINATION-----------------------------------------------------------------------------"AQADM"."INQ"@M2V102.ESSQL> select OWNER, NAME, ENQUEUE_ENABLED, DEQUEUE_ENABLED from [email protected];OWNER NAME ENQUEUE DEQUEUE-------- ------ ----------- -----------AQADM INQ YES YES 4.9. Do the Target and Source Database Charactersets Differ? If a message fails to propagate, check the database charactersets of the source and target databases. Investigate whether the same message can propagate between the databases with the same characterset or it is only a particular combination of charactersets which causes a problem. 4.10. Check the Queue Table Type Agreement Propagation is not possible between queue tables which have types that differ in some respect. One way to determine if this is the case is to run the DBMS_AQADM.VERIFY_QUEUE_TYPES procedure for the two queues that the propagation operates on. If the types do not agree, DBMS_AQADM.VERIFY_QUEUE_TYPES will return '0'.For AQ propagation between databases which have different NLS_LENGTH_SEMANTICS settings, propagation will not work, unless the queues are Oracle Streams ANYDATA queues.See the following notes for issues caused by lack of type agreement:Document 1079577.1 Advanced Queuing Propagation Fails With "ORA-22370: incorrect usage of method"Document 282987.1 Propagated Messages marked UNDELIVERABLE after Drop and Recreate Of Remote QueueDocument 353754.1 Streams Messaging Propagation Fails between Single and Multi-byte Charactersets when using Chararacter Length Semantics in the ADT 4.11. Enable Propagation Tracing 4.11.1. System Level This is set it in the init.ora/spfile as follows: event="24040 trace name context forever, level 10" and restart the instanceThis event cannot be set dynamically with an alter system command until version 10.2: SQL> alter system set events '24040 trace name context forever, level 10'; To unset the event: SQL> alter system set events '24040 trace name context off'; Debugging information will be logged to job queue trace file(s) (jnnn) as propagation takes place. You can check the trace file for errors, and for statements indicating that messages have been sent. For the most part the trace information is understandable. This trace should also be uploaded to Oracle Support if a service request is created. 4.11.2. Attaching to a Specific Process We can also attach to an existing job queue processes that is running a propagation schedule and trace it individually using the oradebug utility, as follows:10.2 and below connect / as sysdbaselect p.SPID, p.PROGRAM from v$PROCESS p, DBA_JOBS_RUNNING jr, V$SESSION s, DBA_JOBS j where s.SID=jr.SID and s.PADDR=p.ADDR and jr.JOB=j.JOB and j.WHAT like '%sys.dbms_aqadm.aq$_propaq(job)%';-- For the process id (SPID) attach to it via oradebug and generate the following traceoradebug setospid <SPID>oradebug unlimitoradebug Event 10046 trace name context forever, level 12oradebug Event 24040 trace name context forever, level 10-- Trace the process for 5 minutesoradebug Event 10046 trace name context offoradebug Event 24040 trace name context off-- The following command returns the pathname/filename to the file being written tooradebug tracefile_name 11g connect / as sysdbacol PROGRAM for a30select p.SPID, p.PROGRAM, j.JOB_NAMEfrom v$PROCESS p, DBA_SCHEDULER_RUNNING_JOBS jr, V$SESSION s, DBA_SCHEDULER_JOBS j where s.SID=jr.SESSION_ID and s.PADDR=p.ADDR and jr.JOB_NAME=j.JOB_NAME and j.JOB_NAME like '%AQ_JOB$_%';-- For the process id (SPID) attach to it via oradebug and generate the following traceoradebug setospid <SPID>oradebug unlimitoradebug Event 10046 trace name context forever, level 12oradebug Event 24040 trace name context forever, level 10-- Trace the process for 5 minutesoradebug Event 10046 trace name context offoradebug Event 24040 trace name context off-- The following command returns the pathname/filename to the file being written tooradebug tracefile_name 4.11.3. Further Tracing The previous tracing steps only trace the job queue process executing the propagation on the source. At times it is useful to trace the propagation receiver process (the session which is enqueueing the messages into the target queue) on the target database which is associated with the job queue process on the source database.These following queries provide ways of identifying the processes involved in propagation so that you can attach to them via oradebug to generate trace information.In order to identify the propagation receiver process you need to execute the query as a user with privileges to access the v$ views in both the local and remote databases so the database link must connect as a user with those privileges in the remote database. The <DBLINK> in the queries should be replaced by the appropriate database link.The queries have two forms due to the differences between operating systems. The value returned by 'Rem Process' is the operating system identifier of the propagation receiver on the remote database. Once identified, this process can be attached to and traced on the remote database using the commands given in Section 4.11.2.10.2 and below - Windows select pl.SPID "JobQ Process", pl.PROGRAM, sr.PROCESS "Rem Process" from v$PROCESS pl, DBA_JOBS_RUNNING jr, V$SESSION s, DBA_JOBS j, V$SESSION@<DBLINK> sr where s.SID=jr.SID and s.PADDR=pl.ADDR and jr.JOB=j.JOB and j.WHAT like '%sys.dbms_aqadm.aq$_propaq(job)%' and pl.SPID=substr(sr.PROCESS, instr(sr.PROCESS,':')+1); 10.2 and below - Unix select pl.SPID "JobQ Process", pl.PROGRAM, sr.PROCESS "Rem Process" from V$PROCESS pl, DBA_JOBS_RUNNING jr, V$SESSION s, DBA_JOBS j, V$SESSION@<DBLINK> sr where s.SID=jr.SID and s.PADDR=pl.ADDR and jr.JOB=j.JOB and j.WHAT like '%sys.dbms_aqadm.aq$_propaq(job)%' and pl.SPID=sr.PROCESS; 11g - Windows select pl.SPID "JobQ Process", pl.PROGRAM, sr.PROCESS "Rem Process" from V$PROCESS pl, DBA_SCHEDULER_RUNNING_JOBS jr, V$SESSION s, DBA_SCHEDULER_JOBS j, V$SESSION@<DBLINK> sr where s.SID=jr.SESSION_ID and s.PADDR=pl.ADDR and jr.JOB_NAME=j.JOB_NAME and j.JOB_NAME like '%AQ_JOB$_%%' and pl.SPID=substr(sr.PROCESS, instr(sr.PROCESS,':')+1); 11g - Unix select pl.SPID "JobQ Process", pl.PROGRAM, sr.PROCESS "Rem Process" from V$PROCESS pl, DBA_SCHEDULER_RUNNING_JOBS jr, V$SESSION s, DBA_SCHEDULER_JOBS j, V$SESSION@<DBLINK> sr where s.SID=jr.SESSION_ID and s.PADDR=pl.ADDR and jr.JOB_NAME=j.JOB_NAME and j.JOB_NAME like '%AQ_JOB$_%%' and pl.SPID=sr.PROCESS; 5. Additional Troubleshooting Steps for AQ Propagation of User-Enqueued and Dequeued Messages 5.1. Check the Privileges of All Users Involved Ensure that the owner of the database link has the necessary privileges on the aq packages. SQL> select TABLE_NAME, PRIVILEGE from USER_TAB_PRIVS;TABLE_NAME PRIVILEGE------------------------------ ----------------------------------------DBMS_LOCK EXECUTEDBMS_AQ EXECUTEDBMS_AQADM EXECUTEDBMS_AQ_BQVIEW EXECUTEQT52814_BUFFER SELECT Note that when queue table is created, a view called QT<nnn>_BUFFER is created in the SYS schema, and the queue table owner is given SELECT privileges on it. The <nnn> corresponds to the object_id of the associated queue table. SQL> select * from USER_ROLE_PRIVS;USERNAME GRANTED_ROLE ADM DEF OS_------------------------------ ------------------------------ ---- ---- ---AQ_USER1 AQ_ADMINISTRATOR_ROLE NO YES NOAQ_USER1 CONNECT NO YES NOAQ_USER1 RESOURCE NO YES NO It is good practice to configure central AQ administrative user. All admin and processing jobs are created, executed and administered as this user. This configuration is not mandatory however, and the database link can be owned by any existing queue user. If this latter configuration is used, ensure that the connecting user has the necessary privileges on the AQ packages and objects involved. Privileges for an AQ Administrative user Execute on DBMS_AQADM Execute on DBMS_AQ Granted the AQ_ADMINISTRATOR_ROLE Privileges for an AQ user Execute on DBMS_AQ Execute on the message payload Enqueue privileges on the remote queue Dequeue privileges on the originating queue Privileges need to be confirmed on both sites when propagation is scheduled to remote destinations. Verify that the user ID used to login to the destination through the database link has been granted privileges to use AQ. 5.2. Verify Queue Payload Types AQ will not propagate messages from one queue to another if the payload types of the two queues are not verified to be equivalent. An AQ administrator can verify if the source and destination's payload types match by executing the DBMS_AQADM.VERIFY_QUEUE_TYPES procedure. The results of the type checking will be stored in the SYS.AQ$_MESSAGE_TYPES table. This table can be accessed using the object identifier OID of the source queue and the address database link of the destination queue, i.e. [schema.]queue_name[@destination]. Prior to Oracle 9i the payload (message type) had to be the same for all the queue tables involved in propagation. From Oracle9i onwards a transformation can be used so that payloads can be converted from one type to another. The following procedural call made on the source database can verify whether we can propagate between the source and the destination queue tables. connect aq_user1/[email protected] serverout onDECLARErc_value number;BEGINDBMS_AQADM.VERIFY_QUEUE_TYPES(src_queue_name => 'AQ_USER1.Q_1', dest_queue_name => 'AQ_USER2.Q_2',destination => 'dbl_aq_user2.es',rc => rc_value);dbms_output.put_line('rc_value code is '||rc_value);END;/ If propagation is possible then the return code value will be 1. If it is 0 then propagation is not possible and further investigation of the types and transformations used by and in conjunction with the queue tables is required. With regard to comparison of the types the following sql can be used to extract the DDL for a specific type with' %' changed appropriately on the source and target. This can then be compared for the source and target. SET LONG 20000 set pagesize 50 EXECUTE DBMS_METADATA.SET_TRANSFORM_PARAM(DBMS_METADATA.SESSION_TRANSFORM, 'STORAGE',false); SELECT DBMS_METADATA.GET_DDL('TYPE',t.type_name) from user_types t WHERE t.type_name like '%'; EXECUTE DBMS_METADATA.SET_TRANSFORM_PARAM(DBMS_METADATA.SESSION_TRANSFORM, 'DEFAULT'); 5.3. Check Message State and Destination The first step in this process is to identify the queue table associated with the problem source queue. Although you schedule propagation for a specific queue, most of the meta-data associated with that queue is stored in the underlying queue table. The following statement finds the queue table for a given queue (note that this is a multiple-consumer queue table). SQL> select QUEUE_TABLE from DBA_QUEUES where NAME = 'MULTIPLEQ';QUEUE_TABLE --------------------MULTIPLEQTABLE For a small amount of messages in a multiple-consumer queue table, the following query can be run: SQL> select MSG_STATE, CONSUMER_NAME, ADDRESS from AQ$MULTIPLEQTABLE where QUEUE = 'MULTIPLEQ';MSG_STATE CONSUMER_NAME ADDRESS-------------- ----------------------- -------------READY AQUSER2 [email protected] AQUSER1READY AQUSER3 AQADM.INQ In this example we see 2 messages ready to be propagated to remote queues and 1 that is not. If the address column is blank, the message is not scheduled for propagation and can only be dequeued from the queue upon which it was enqueued. The MSG_STATE column values are discussed in Document 102330.1 Advanced Queueing MSG_STATE Values and their Interpretation. If the address column has a value, the message has been enqueued for propagation to another queue. The first row in the example includes a database link (@M2V102.ES). This demonstrates that the message should be propagated to a queue at a remote database. The third row does not include a database link so will be propagated to a queue that resides on the same database as the source queue. The consumer name is the intended recipient at the target queue. Note that we are not querying the base queue table directly; rather, we are querying a view that is available on top of every queue table, AQ$<queue_table_name>.A more realistic query in an environment where the queue table contains thousands of messages is8.0.3-compatible multiple-consumer queue table and all compatibility single-consumer queue tables select count(*), MSG_STATE, QUEUE from AQ$<queue_table_name> group by MSG_STATE, QUEUE; 8.1.3 and 10.0-compatible queue tables select count(*), MSG_STATE, QUEUE, CONSUMER_NAME from AQ$<queue_table_name>group by MSG_STATE, QUEUE, CONSUMER_NAME; For multiple-consumer queue tables, if you did not see the expected CONSUMER_NAME , check the syntax of the enqueue code and verify the recipients are declared correctly. If a recipients list is not used on enqueue, check the subscriber list in the AQ$_<queue_table_name>_S view (note that a single-consumer queue table does not have a subscriber view. This view records all members of the default subscription list which were added using the DBMS_AQADM.ADD_SUBSCRIBER procedure and also those enqueued using a recipient list. SQL> select QUEUE, NAME, ADDRESS from AQ$MULTIPLEQTABLE_S;QUEUE NAME ADDRESS---------- ----------- -------------MULTIPLEQ AQUSER2 [email protected] AQUSER1 In this example we have 2 subscribers registered with the queue. We have a local subscriber AQUSER1, and a remote subscriber AQUSER2, on the queue INQ, owned by AQADM, at M2V102.ES. Unless overridden with a recipient list during enqueue every message enqueued to this queue will be propagated to INQ at M2V102.ES.For 8.1 style and above multiple consumer queue tables, you can also check the following information at the target: select CONSUMER_NAME, DEQ_TXN_ID, DEQ_TIME, DEQ_USER_ID, PROPAGATED_MSGID from AQ$<queue_table_name> where QUEUE = '<QUEUE_NAME>'; For 8.0 style queues, if the queue table supports multiple consumers you can obtain the same information from the history column of the queue table: select h.CONSUMER, h.TRANSACTION_ID, h.DEQ_TIME, h.DEQ_USER, h.PROPAGATED_MSGIDfrom AQ$<queue_table_name> t, table(t.history) h where t.Q_NAME = '<QUEUE_NAME>'; A non-NULL TRANSACTION_ID indicates that the message was successfully propagated. Further, the DEQ_TIME indicates the time of propagation, the DEQ_USER indicates the userid used for propagation, and the PROPAGATED_MSGID indicates the message ID of the message that was enqueued at the destination. 6. Additional Troubleshooting Steps for Propagation in an Oracle Streams Environment 6.1. Is the Propagation Enabled? For a propagation job to propagate messages, the propagation must be enabled. For Streams, a special view called DBA_PROPAGATION exists to convey information about Streams propagations. If messages are not being propagated by a propagation as expected, then the propagation might not be enabled. To query for this: SELECT p.PROPAGATION_NAME, DECODE(s.SCHEDULE_DISABLED, 'Y', 'Disabled','N', 'Enabled') SCHEDULE_DISABLED, s.PROCESS_NAME, s.FAILURES, s.LAST_ERROR_MSGFROM DBA_QUEUE_SCHEDULES s, DBA_PROPAGATION pWHERE p.DESTINATION_DBLINK = NVL(REGEXP_SUBSTR(s.DESTINATION, '[^@]+', 1, 2), s.DESTINATION) AND s.SCHEMA = p.SOURCE_QUEUE_OWNER AND s.QNAME = p.SOURCE_QUEUE_NAME AND MESSAGE_DELIVERY_MODE = 'PERSISTENT' order by PROPAGATION_NAME; At times, the propagation job may become "broken" or fail to start after an error has been encountered or after a database restart. If an error is indicated by the above query, an attempt to disable the propagation and then re-enable it can be made. In the examples below, for the propagation named STRMADMIN_PROPAGATE where the queue name is STREAMS_QUEUE owned by STRMADMIN and the destination database link is ORCL2.WORLD, the commands would be:10.2 and above exec dbms_propagation_adm.stop_propagation('STRMADMIN_PROPAGATE'); exec dbms_propagation_adm.start_propagation('STRMADMIN_PROPAGATE'); If the above does not fix the problem, stop the propagation specifying the force parameter (2nd parameter on stop_propagation) as TRUE: exec dbms_propagation_adm.stop_propagation('STRMADMIN_PROPAGATE',true); exec dbms_propagation_adm.start_propagation('STRMADMIN_PROPAGATE'); The statistics for the propagation as well as any old error messages are cleared when the force parameter is set to TRUE. Therefore if the propagation schedule is stopped with FORCE set to TRUE, and upon restart there is still an error message in DBA_PROPAGATION, then the error message is current.9.2 or 10.1 exec dbms_aqadm.disable_propagation_schedule('STRMADMIN.STREAMS_QUEUE','ORCL2.WORLD'); exec dbms.aqadm.enable_propagation_schedule('STRMADMIN.STREAMS_QUEUE','ORCL2.WORLD'); If the above does not fix the problem, perform an unschedule of propagation and then schedule_propagation: exec dbms_aqadm.unschedule_propagation('STRMADMIN.STREAMS_QUEUE','ORCL2.WORLD'); exec dbms_aqadm.schedule_propagation('STRMADMIN.STREAMS_QUEUE','ORCL2.WORLD'); Typically if the error from the first query in Section 6.1 recurs after restarting the propagation as shown above, further troubleshooting of the error is needed. 6.2. Check Propagation Rule Sets and Transformations Inspect the configuration of the rules in the rule set that is associated with the propagation process to make sure that they evaluate to TRUE as expected. If not, then the object or schema will not be propagated. Remember that when a negative rule evaluates to TRUE, the specified object or schema will not be propagated. Finally inspect any rule-based transformations that are implemented with propagation to make sure they are changing the data in the intended way.The following query shows what rule sets are assigned to a propagation: select PROPAGATION_NAME, RULE_SET_OWNER||'.'||RULE_SET_NAME "Positive Rule Set",NEGATIVE_RULE_SET_OWNER||'.'||NEGATIVE_RULE_SET_NAME "Negative Rule Set"from DBA_PROPAGATION; The next two queries list the propagation rules and their conditions. The first is for the positive rule set, the second is for the negative rule set: set long 4000select rsr.RULE_SET_OWNER||'.'||rsr.RULE_SET_NAME RULE_SET ,rsr.RULE_OWNER||'.'||rsr.RULE_NAME RULE_NAME,r.RULE_CONDITION CONDITION fromDBA_RULE_SET_RULES rsr, DBA_RULES rwhere rsr.RULE_NAME = r.RULE_NAME and rsr.RULE_OWNER = r.RULE_OWNER and RULE_SET_NAME in(select RULE_SET_NAME from DBA_PROPAGATION) order by rsr.RULE_SET_OWNER, rsr.RULE_SET_NAME; set long 4000select c.PROPAGATION_NAME, rsr.RULE_SET_OWNER||'.'||rsr.RULE_SET_NAME RULE_SET ,rsr.RULE_OWNER||'.'||rsr.RULE_NAME RULE_NAME,r.RULE_CONDITION CONDITION fromDBA_RULE_SET_RULES rsr, DBA_RULES r ,DBA_PROPAGATION cwhere rsr.RULE_NAME = r.RULE_NAME and rsr.RULE_OWNER = r.RULE_OWNER andrsr.RULE_SET_OWNER=c.NEGATIVE_RULE_SET_OWNER and rsr.RULE_SET_NAME=c.NEGATIVE_RULE_SET_NAMEand rsr.RULE_SET_NAME in(select NEGATIVE_RULE_SET_NAME from DBA_PROPAGATION) order by rsr.RULE_SET_OWNER, rsr.RULE_SET_NAME; 6.3. Determining the Total Number of Messages and Bytes Propagated As in Section 3.1, determining if messages are flowing can be instructive to see whether the propagation is entirely hung or just slow. If the propagation is not in flow control (see Section 6.5.2), but the statistics are incrementing slowly, there may be a performance issue. For Streams implementations two views are available that can assist with this that can show the number of messages sent by a propagation, as well as the number of acknowledgements being returned from the target site: the V$PROPAGATION_SENDER view at the Source site and the V$PROPAGATION_RECEIVER view at the destination site. It is helpful to query both to determine if messages are being delivered to the target. Look for the statistics to increase.Source: select QUEUE_SCHEMA, QUEUE_NAME, DBLINK,HIGH_WATER_MARK, ACKNOWLEDGEMENT, TOTAL_MSGS, TOTAL_BYTESfrom V$PROPAGATION_SENDER; Target: select SRC_QUEUE_SCHEMA, SRC_QUEUE_NAME, SRC_DBNAME, DST_QUEUE_SCHEMA, DST_QUEUE_NAME, HIGH_WATER_MARK, ACKNOWLEDGEMENT, TOTAL_MSGS from V$PROPAGATION_RECEIVER; 6.4. Check Buffered Subscribers The V$BUFFERED_SUBSCRIBERS view displays information about subscribers for all buffered queues in the instance. This view can be queried to make sure that the site that the propagation is propagating to is listed as a subscriber address for the site being propagated from: select QUEUE_SCHEMA, QUEUE_NAME, SUBSCRIBER_ADDRESS from V$BUFFERED_SUBSCRIBERS; The SUBSCRIBER_ADDRESS column will not be populated when the propagation is local (between queues on the same database). 6.5. Common Streams Propagation Errors 6.5.1. ORA-02082: A loopback database link must have a connection qualifier. This error can occur if you use the Streams Setup Wizard in Oracle Enterprise Manager without first configuring the GLOBAL_NAME for your database. 6.5.2. ORA-25307: Enqueue rate too high. Enable flow control DBA_QUEUE_SCHEDULES will display this informational message for propagation when the automatic flow control (10g feature of Streams) has been invoked.Similar to Streams capture processes, a Streams propagation process can also go into a state of 'flow control. This is an informative message that indicates flow control has been automatically enabled to reduce the rate at which messages are being enqueued into at target queue.This typically occurs when the target site is unable to keep up with the rate of messages flowing from the source site. Other than checking that the apply process is running normally on the target site, usually no action is required by the DBA. Propagation and the capture process will be resumed automatically when the target site is able to accept more messages.The following document contains more information:Document 302109.1 Streams Propagation Error: ORA-25307 Enqueue rate too high. Enable flow controlSee the following document for one potential cause of this situation:Document 1097115.1 Oracle Streams Apply Reader is in 'Paused' State 6.5.3. ORA-25315 unsupported configuration for propagation of buffered messages This error typically occurs when the target database is RAC and usually indicates that an attempt was made to propagate buffered messages with the database link pointing to an instance in the destination database which is not the owner instance of the destination queue. To resolve the problem, use queue-to-queue propagation for buffered messages. 6.5.4. ORA-600 [KWQBMCRCPTS101] after dropping / recreating propagation For cause/fixes refer to:Document 421237.1 ORA-600 [KWQBMCRCPTS101] reported by a Qmon slave process after dropping a Streams Propagation 6.5.5. Stopping or Dropping a Streams Propagation Hangs See the following note:Document 1159787.1 Troubleshooting Streams Propagation When It is Not Functioning and Attempts to Stop It Hang 6.6. Streams Propagation-Related Notes for Common Issues Document 437838.1 Streams Specific PatchesDocument 749181.1 How to Recover Streams After Dropping PropagationDocument 368912.1 Queue to Queue Propagation Schedule encountered ORA-12514 in a RAC environmentDocument 564649.1 ORA-02068/ORA-03114/ORA-03113 Errors From Streams Propagation Process - Remote Database is Available and Unschedule/Reschedule Does Not ResolveDocument 553017.1 Stream Propagation Process Errors Ora-4052 Ora-6554 From 11g To 10201Document 944846.1 Streams Propagation Fails Ora-7445 [kohrsmc]Document 745601.1 ORA-23603 'STREAMS enqueue aborted due to low SGA' Error from Streams Propagation, and V$STREAMS_CAPTURE.STATE Hanging on 'Enqueuing Message'Document 333068.1 ORA-23603: Streams Enqueue Aborted Eue To Low SGADocument 363496.1 Ora-25315 Propagating on RAC StreamsDocument 368237.1 Unable to Unschedule Propagation. Streams Queue is InvalidDocument 436332.1 dbms_propagation_adm.stop_propagation hangsDocument 727389.1 Propagation Fails With ORA-12528Document 730911.1 ORA-4063 Is Reported After Dropping Negative Prop.RulesetDocument 460471.1 Propagation Blocked by Qmon Process - Streams_queue_table / 'library cache lock' waitsDocument 1165583.1 ORA-600 [kwqpuspse0-ack] In Streams EnvironmentDocument 1059029.1 Combined Capture and Apply (CCA) : Capture aborts : ORA-1422 after schedule_propagationDocument 556309.1 Changing Propagation/ queue_to_queue : false -> true does does not work; no LCRs propagatedDocument 839568.1 Propagation failing with error: ORA-01536: space quota exceeded for tablespace ''Document 311021.1 Streams Propagation Process : Ora 12154 After Reboot with Transparent Application Failover TAF configuredDocument 359971.1 STREAMS propagation to Primary of physical Standby configuation errors with Ora-01033, Ora-02068Document 1101616.1 DBMS_PROPAGATION_ADM.DROP_PROPAGATION FAILS WITH ORA-1747 7. Performance Issues A propagation may seem to be slow if the queries from Sections 3.1 and 6.3 show that the message statistics are not changing quickly. In Oracle Streams, this more usually is due to a slow apply process at the target rather than a slow propagation. Propagation could be inferred to be slow if the message statistics are changing, and the state of a capture process according to V$STREAMS_CAPTURE.STATE is PAUSED FOR FLOW CONTROL, but an ORA-25307 'Enqueue rate too high. Enable flow control' warning is NOT observed in DBA_QUEUE_SCHEDULES per Section 6.5.2. If this is the case, see the following notes / white papers for suggestions to increase performance:Document 335516.1 Master Note for Streams Performance RecommendationsDocument 730036.1 Overview for Troubleshooting Streams Performance IssuesDocument 780733.1 Streams Propagation Tuning with Network ParametersWhite Paper: http://www.oracle.com/technetwork/database/features/availability/maa-wp-10gr2-streams-performance-130059.pdfWhite Paper: Oracle Streams Configuration Best Practices: Oracle Database 10g Release 10.2, http://www.oracle.com/technetwork/database/features/availability/maa-10gr2-streams-configuration-132039.pdf, See APPENDIX A: USING STREAMS CONFIGURATIONS OVER A NETWORKFor basic AQ propagation, the network tuning in the aforementioned Appendix A of the white paper 'Oracle Streams Configuration Best Practices: Oracle Database 10g Release 10.2' is applicable. References NOTE:102330.1 - Advanced Queueing MSG_STATE Values and their InterpretationNOTE:102771.1 - Advanced Queueing Propagation using PL/SQLNOTE:1059029.1 - Combined Capture and Apply (CCA) : Capture aborts : ORA-1422 after schedule_propagationNOTE:1079577.1 - Advanced Queuing Propagation Fails With "ORA-22370: incorrect usage of method"NOTE:1083608.1 - 11g Streams and Oracle SchedulerNOTE:1087324.1 - ORA-01405 ORA-01422 reported by Adavanced Queueing Propagation schedules after RAC reconfigurationNOTE:1097115.1 - Oracle Streams Apply Reader is in 'Paused' StateNOTE:1101616.1 - DBMS_PROPAGATION_ADM.DROP_PROPAGATION FAILS WITH ORA-1747NOTE:1159787.1 - Troubleshooting Streams Propagation When It is Not Functioning and Attempts to Stop It HangNOTE:1165583.1 - ORA-600 [kwqpuspse0-ack] In Streams EnvironmentNOTE:118884.1 - How to unschedule a propagation schedule stuck in pending stateNOTE:1203544.1 - AQ PROPAGATION ABORTED WITH ORA-600[OCIKSIN: INVALID STATUS] ON SYS.DBMS_AQADM_SYS.AQ$_PROPAGATION_PROCEDURE AFTER UPGRADENOTE:1204080.1 - AQ Propagation Failing With ORA-25329 After Upgraded From 8i or 9i to 10g or 11g.NOTE:219416.1 - Advanced Queuing Propagation fails with ORA-22922NOTE:222992.1 - DBMS_AQADM.DISABLE_PROPAGATION_SCHEDULE Returns ORA-24082NOTE:253131.1 - Concurrent Writes May Corrupt LOB Segment When Using Auto Segment Space Management (ORA-1555)NOTE:282987.1 - Propagated Messages marked UNDELIVERABLE after Drop and Recreate Of Remote QueueNOTE:298015.1 - Kwqjswproc:Excep After Loop: Assigning To SelfNOTE:302109.1 - Streams Propagation Error: ORA-25307 Enqueue rate too high. Enable flow controlNOTE:311021.1 - Streams Propagation Process : Ora 12154 After Reboot with Transparent Application Failover TAF configuredNOTE:332792.1 - ORA-04061 error relating to SYS.DBMS_PRVTAQIP reported when setting up StatspackNOTE:333068.1 - ORA-23603: Streams Enqueue Aborted Eue To Low SGANOTE:335516.1 - Master Note for Streams Performance RecommendationsNOTE:353325.1 - ORA-24056: Internal inconsistency for QUEUE and destination NOTE:353754.1 - Streams Messaging Propagation Fails between Single and Multi-byte Charactersets when using Chararacter Length Semantics in the ADT.NOTE:359971.1 - STREAMS propagation to Primary of physical Standby configuation errors with Ora-01033, Ora-02068NOTE:363496.1 - Ora-25315 Propagating on RAC StreamsNOTE:365093.1 - ORA-07445 [kwqppay2aqe()+7360] reported on Propagation of a Transformed MessageNOTE:368237.1 - Unable to Unschedule Propagation. Streams Queue is InvalidNOTE:368912.1 - Queue to Queue Propagation Schedule encountered ORA-12514 in a RAC environmentNOTE:421237.1 - ORA-600 [KWQBMCRCPTS101] reported by a Qmon slave process after dropping a Streams PropagationNOTE:436332.1 - dbms_propagation_adm.stop_propagation hangsNOTE:437838.1 - Streams Specific PatchesNOTE:460471.1 - Propagation Blocked by Qmon Process - Streams_queue_table / 'library cache lock' waitsNOTE:463820.1 - Streams Combined Capture and Apply in 11gNOTE:553017.1 - Stream Propagation Process Errors Ora-4052 Ora-6554 From 11g To 10201NOTE:556309.1 - Changing Propagation/ queue_to_queue : false -> true does does not work; no LCRs propagatedNOTE:564649.1 - ORA-02068/ORA-03114/ORA-03113 Errors From Streams Propagation Process - Remote Database is Available and Unschedule/Reschedule Does Not ResolveNOTE:566622.1 - ORA-22275 when propagating >4K AQ$_JMS_TEXT_MESSAGEs from 9.2.0.8 to 10.2.0.1NOTE:727389.1 - Propagation Fails With ORA-12528NOTE:730036.1 - Overview for Troubleshooting Streams Performance IssuesNOTE:730911.1 - ORA-4063 Is Reported After Dropping Negative Prop.RulesetNOTE:731292.1 - ORA-25215 Reported On Local Propagation When Using Transformation with ANYDATA queue tablesNOTE:731539.1 - ORA-29268: HTTP client error 401 Unauthorized Error when the AQ Servlet attempts to Propagate a message via HTTPNOTE:745601.1 - ORA-23603 'STREAMS enqueue aborted due to low SGA' Error from Streams Propagation, and V$STREAMS_CAPTURE.STATE Hanging on 'Enqueuing Message'NOTE:749181.1 - How to Recover Streams After Dropping PropagationNOTE:780733.1 - Streams Propagation Tuning with Network ParametersNOTE:787367.1 - ORA-22275 reported on Propagating Messages with LOB component when propagating between 10.1 and 10.2NOTE:808136.1 - How to clear the old errors from DBA_PROPAGATION view ?NOTE:827184.1 - AQ Propagation with CLOB data types Fails with ORA-22990NOTE:827473.1 - How to alter propagation from queue_to_queue to queue_to_dblinkNOTE:839568.1 - Propagation failing with error: ORA-01536: space quota exceeded for tablespace ''NOTE:846297.1 - AQ Propagation Fails : ORA-00600[kope2upic2954] or Ora-00600[Kghsstream_copyn]NOTE:944846.1 - Streams Propagation Fails Ora-7445 [kohrsmc]

Read the article
Toorcon 15 (2013)

- by danx

The Toorcon gang (senior staff): h1kari (founder), nfiltr8, and Geo Introduction to Toorcon 15 (2013) A Tale of One Software Bypass of MS Windows 8 Secure Boot Breaching SSL, One Byte at a Time Running at 99%: Surviving an Application DoS Security Response in the Age of Mass Customized Attacks x86 Rewriting: Defeating RoP and other Shinanighans Clowntown Express: interesting bugs and running a bug bounty program Active Fingerprinting of Encrypted VPNs Making Attacks Go Backwards Mask Your Checksums—The Gorry Details Adventures with weird machines thirty years after "Reflections on Trusting Trust" Introduction to Toorcon 15 (2013) Toorcon 15 is the 15th annual security conference held in San Diego. I've attended about a third of them and blogged about previous conferences I attended here starting in 2003. As always, I've only summarized the talks I attended and interested me enough to write about them. Be aware that I may have misrepresented the speaker's remarks and that they are not my remarks or opinion, or those of my employer, so don't quote me or them. Those seeking further details may contact the speakers directly or use The Google. For some talks, I have a URL for further information. A Tale of One Software Bypass of MS Windows 8 Secure Boot Andrew Furtak and Oleksandr Bazhaniuk Yuri Bulygin, Oleksandr ("Alex") Bazhaniuk, and (not present) Andrew Furtak Yuri and Alex talked about UEFI and Bootkits and bypassing MS Windows 8 Secure Boot, with vendor recommendations. They previously gave this talk at the BlackHat 2013 conference. MS Windows 8 Secure Boot Overview UEFI (Unified Extensible Firmware Interface) is interface between hardware and OS. UEFI is processor and architecture independent. Malware can replace bootloader (bootx64.efi, bootmgfw.efi). Once replaced can modify kernel. Trivial to replace bootloader. Today many legacy bootkits—UEFI replaces them most of them. MS Windows 8 Secure Boot verifies everything you load, either through signatures or hashes. UEFI firmware relies on secure update (with signed update). You would think Secure Boot would rely on ROM (such as used for phones0, but you can't do that for PCs—PCs use writable memory with signatures DXE core verifies the UEFI boat loader(s) OS Loader (winload.efi, winresume.efi) verifies the OS kernel A chain of trust is established with a root key (Platform Key, PK), which is a cert belonging to the platform vendor. Key Exchange Keys (KEKs) verify an "authorized" database (db), and "forbidden" database (dbx). X.509 certs with SHA-1/SHA-256 hashes. Keys are stored in non-volatile (NV) flash-based NVRAM. Boot Services (BS) allow adding/deleting keys (can't be accessed once OS starts—which uses Run-Time (RT)). Root cert uses RSA-2048 public keys and PKCS#7 format signatures. SecureBoot — enable disable image signature checks SetupMode — update keys, self-signed keys, and secure boot variables CustomMode — allows updating keys Secure Boot policy settings are: always execute, never execute, allow execute on security violation, defer execute on security violation, deny execute on security violation, query user on security violation Attacking MS Windows 8 Secure Boot Secure Boot does NOT protect from physical access. Can disable from console. Each BIOS vendor implements Secure Boot differently. There are several platform and BIOS vendors. It becomes a "zoo" of implementations—which can be taken advantage of. Secure Boot is secure only when all vendors implement it correctly. Allow only UEFI firmware signed updates protect UEFI firmware from direct modification in flash memory protect FW update components program SPI controller securely protect secure boot policy settings in nvram protect runtime api disable compatibility support module which allows unsigned legacy Can corrupt the Platform Key (PK) EFI root certificate variable in SPI flash. If PK is not found, FW enters setup mode wich secure boot turned off. Can also exploit TPM in a similar manner. One is not supposed to be able to directly modify the PK in SPI flash from the OS though. But they found a bug that they can exploit from User Mode (undisclosed) and demoed the exploit. It loaded and ran their own bootkit. The exploit requires a reboot. Multiple vendors are vulnerable. They will disclose this exploit to vendors in the future. Recommendations: allow only signed updates protect UEFI fw in ROM protect EFI variable store in ROM Breaching SSL, One Byte at a Time Yoel Gluck and Angelo Prado Angelo Prado and Yoel Gluck, Salesforce.com CRIME is software that performs a "compression oracle attack." This is possible because the SSL protocol doesn't hide length, and because SSL compresses the header. CRIME requests with every possible character and measures the ciphertext length. Look for the plaintext which compresses the most and looks for the cookie one byte-at-a-time. SSL Compression uses LZ77 to reduce redundancy. Huffman coding replaces common byte sequences with shorter codes. US CERT thinks the SSL compression problem is fixed, but it isn't. They convinced CERT that it wasn't fixed and they issued a CVE. BREACH, breachattrack.com BREACH exploits the SSL response body (Accept-Encoding response, Content-Encoding). It takes advantage of the fact that the response is not compressed. BREACH uses gzip and needs fairly "stable" pages that are static for ~30 seconds. It needs attacker-supplied content (say from a web form or added to a URL parameter). BREACH listens to a session's requests and responses, then inserts extra requests and responses. Eventually, BREACH guesses a session's secret key. Can use compression to guess contents one byte at-a-time. For example, "Supersecret SupersecreX" (a wrong guess) compresses 10 bytes, and "Supersecret Supersecret" (a correct guess) compresses 11 bytes, so it can find each character by guessing every character. To start the guess, BREACH needs at least three known initial characters in the response sequence. Compression length then "leaks" information. Some roadblocks include no winners (all guesses wrong) or too many winners (multiple possibilities that compress the same). The solutions include: lookahead (guess 2 or 3 characters at-a-time instead of 1 character). Expensive rollback to last known conflict check compression ratio can brute-force first 3 "bootstrap" characters, if needed (expensive) block ciphers hide exact plain text length. Solution is to align response in advance to block size Mitigations length: use variable padding secrets: dynamic CSRF tokens per request secret: change over time separate secret to input-less servlets Future work eiter understand DEFLATE/GZIP HTTPS extensions Running at 99%: Surviving an Application DoS Ryan Huber Ryan Huber, Risk I/O Ryan first discussed various ways to do a denial of service (DoS) attack against web services. One usual method is to find a slow web page and do several wgets. Or download large files. Apache is not well suited at handling a large number of connections, but one can put something in front of it Can use Apache alternatives, such as nginx How to identify malicious hosts short, sudden web requests user-agent is obvious (curl, python) same url requested repeatedly no web page referer (not normal) hidden links. hide a link and see if a bot gets it restricted access if not your geo IP (unless the website is global) missing common headers in request regular timing first seen IP at beginning of attack count requests per hosts (usually a very large number) Use of captcha can mitigate attacks, but you'll lose a lot of genuine users. Bouncer, goo.gl/c2vyEc and www.github.com/rawdigits/Bouncer Bouncer is software written by Ryan in netflow. Bouncer has a small, unobtrusive footprint and detects DoS attempts. It closes blacklisted sockets immediately (not nice about it, no proper close connection). Aggregator collects requests and controls your web proxies. Need NTP on the front end web servers for clean data for use by bouncer. Bouncer is also useful for a popularity storm ("Slashdotting") and scraper storms. Future features: gzip collection data, documentation, consumer library, multitask, logging destroyed connections. Takeaways: DoS mitigation is easier with a complete picture Bouncer designed to make it easier to detect and defend DoS—not a complete cure Security Response in the Age of Mass Customized Attacks Peleus Uhley and Karthik Raman Peleus Uhley and Karthik Raman, Adobe ASSET, blogs.adobe.com/asset/ Peleus and Karthik talked about response to mass-customized exploits. Attackers behave much like a business. "Mass customization" refers to concept discussed in the book Future Perfect by Stan Davis of Harvard Business School. Mass customization is differentiating a product for an individual customer, but at a mass production price. For example, the same individual with a debit card receives basically the same customized ATM experience around the world. Or designing your own PC from commodity parts. Exploit kits are another example of mass customization. The kits support multiple browsers and plugins, allows new modules. Exploit kits are cheap and customizable. Organized gangs use exploit kits. A group at Berkeley looked at 77,000 malicious websites (Grier et al., "Manufacturing Compromise: The Emergence of Exploit-as-a-Service", 2012). They found 10,000 distinct binaries among them, but derived from only a dozen or so exploit kits. Characteristics of Mass Malware: potent, resilient, relatively low cost Technical characteristics: multiple OS, multipe payloads, multiple scenarios, multiple languages, obfuscation Response time for 0-day exploits has gone down from ~40 days 5 years ago to about ~10 days now. So the drive with malware is towards mass customized exploits, to avoid detection There's plenty of evicence that exploit development has Project Manager bureaucracy. They infer from the malware edicts to: support all versions of reader support all versions of windows support all versions of flash support all browsers write large complex, difficult to main code (8750 lines of JavaScript for example Exploits have "loose coupling" of multipe versions of software (adobe), OS, and browser. This allows specific attacks against specific versions of multiple pieces of software. Also allows exploits of more obscure software/OS/browsers and obscure versions. Gave examples of exploits that exploited 2, 3, 6, or 14 separate bugs. However, these complete exploits are more likely to be buggy or fragile in themselves and easier to defeat. Future research includes normalizing malware and Javascript. Conclusion: The coming trend is that mass-malware with mass zero-day attacks will result in mass customization of attacks. x86 Rewriting: Defeating RoP and other Shinanighans Richard Wartell Richard Wartell The attack vector we are addressing here is: First some malware causes a buffer overflow. The malware has no program access, but input access and buffer overflow code onto stack Later the stack became non-executable. The workaround malware used was to write a bogus return address to the stack jumping to malware Later came ASLR (Address Space Layout Randomization) to randomize memory layout and make addresses non-deterministic. The workaround malware used was to jump t existing code segments in the program that can be used in bad ways "RoP" is Return-oriented Programming attacks. RoP attacks use your own code and write return address on stack to (existing) expoitable code found in program ("gadgets"). Pinkie Pie was paid $60K last year for a RoP attack. One solution is using anti-RoP compilers that compile source code with NO return instructions. ASLR does not randomize address space, just "gadgets". IPR/ILR ("Instruction Location Randomization") randomizes each instruction with a virtual machine. Richard's goal was to randomize a binary with no source code access. He created "STIR" (Self-Transofrming Instruction Relocation). STIR disassembles binary and operates on "basic blocks" of code. The STIR disassembler is conservative in what to disassemble. Each basic block is moved to a random location in memory. Next, STIR writes new code sections with copies of "basic blocks" of code in randomized locations. The old code is copied and rewritten with jumps to new code. the original code sections in the file is marked non-executible. STIR has better entropy than ASLR in location of code. Makes brute force attacks much harder. STIR runs on MS Windows (PEM) and Linux (ELF). It eliminated 99.96% or more "gadgets" (i.e., moved the address). Overhead usually 5-10% on MS Windows, about 1.5-4% on Linux (but some code actually runs faster!). The unique thing about STIR is it requires no source access and the modified binary fully works! Current work is to rewrite code to enforce security policies. For example, don't create a *.{exe,msi,bat} file. Or don't connect to the network after reading from the disk. Clowntown Express: interesting bugs and running a bug bounty program Collin Greene Collin Greene, Facebook Collin talked about Facebook's bug bounty program. Background at FB: FB has good security frameworks, such as security teams, external audits, and cc'ing on diffs. But there's lots of "deep, dark, forgotten" parts of legacy FB code. Collin gave several examples of bountied bugs. Some bounty submissions were on software purchased from a third-party (but bounty claimers don't know and don't care). We use security questions, as does everyone else, but they are basically insecure (often easily discoverable). Collin didn't expect many bugs from the bounty program, but they ended getting 20+ good bugs in first 24 hours and good submissions continue to come in. Bug bounties bring people in with different perspectives, and are paid only for success. Bug bounty is a better use of a fixed amount of time and money versus just code review or static code analysis. The Bounty program started July 2011 and paid out $1.5 million to date. 14% of the submissions have been high priority problems that needed to be fixed immediately. The best bugs come from a small % of submitters (as with everything else)—the top paid submitters are paid 6 figures a year. Spammers like to backstab competitors. The youngest sumitter was 13. Some submitters have been hired. Bug bounties also allows to see bugs that were missed by tools or reviews, allowing improvement in the process. Bug bounties might not work for traditional software companies where the product has release cycle or is not on Internet. Active Fingerprinting of Encrypted VPNs Anna Shubina Anna Shubina, Dartmouth Institute for Security, Technology, and Society (I missed the start of her talk because another track went overtime. But I have the DVD of the talk, so I'll expand later) IPsec leaves fingerprints. Using netcat, one can easily visually distinguish various crypto chaining modes just from packet timing on a chart (example, DES-CBC versus AES-CBC) One can tell a lot about VPNs just from ping roundtrips (such as what router is used) Delayed packets are not informative about a network, especially if far away from the network More needed to explore about how TCP works in real life with respect to timing Making Attacks Go Backwards Fuzzynop FuzzyNop, Mandiant This talk is not about threat attribution (finding who), product solutions, politics, or sales pitches. But who are making these malware threats? It's not a single person or group—they have diverse skill levels. There's a lot of fat-fingered fumblers out there. Always look for low-hanging fruit first: "hiding" malware in the temp, recycle, or root directories creation of unnamed scheduled tasks obvious names of files and syscalls ("ClearEventLog") uncleared event logs. Clearing event log in itself, and time of clearing, is a red flag and good first clue to look for on a suspect system Reverse engineering is hard. Disassembler use takes practice and skill. A popular tool is IDA Pro, but it takes multiple interactive iterations to get a clean disassembly. Key loggers are used a lot in targeted attacks. They are typically custom code or built in a backdoor. A big tip-off is that non-printable characters need to be printed out (such as "[Ctrl]" "[RightShift]") or time stamp printf strings. Look for these in files. Presence is not proof they are used. Absence is not proof they are not used. Java exploits. Can parse jar file with idxparser.py and decomile Java file. Java typially used to target tech companies. Backdoors are the main persistence mechanism (provided externally) for malware. Also malware typically needs command and control. Application of Artificial Intelligence in Ad-Hoc Static Code Analysis John Ashaman John Ashaman, Security Innovation Initially John tried to analyze open source files with open source static analysis tools, but these showed thousands of false positives. Also tried using grep, but tis fails to find anything even mildly complex. So next John decided to write his own tool. His approach was to first generate a call graph then analyze the graph. However, the problem is that making a call graph is really hard. For example, one problem is "evil" coding techniques, such as passing function pointer. First the tool generated an Abstract Syntax Tree (AST) with the nodes created from method declarations and edges created from method use. Then the tool generated a control flow graph with the goal to find a path through the AST (a maze) from source to sink. The algorithm is to look at adjacent nodes to see if any are "scary" (a vulnerability), using heuristics for search order. The tool, called "Scat" (Static Code Analysis Tool), currently looks for C# vulnerabilities and some simple PHP. Later, he plans to add more PHP, then JSP and Java. For more information see his posts in Security Innovation blog and NRefactory on GitHub. Mask Your Checksums—The Gorry Details Eric (XlogicX) Davisson Eric (XlogicX) Davisson Sometimes in emailing or posting TCP/IP packets to analyze problems, you may want to mask the IP address. But to do this correctly, you need to mask the checksum too, or you'll leak information about the IP. Problem reports found in stackoverflow.com, sans.org, and pastebin.org are usually not masked, but a few companies do care. If only the IP is masked, the IP may be guessed from checksum (that is, it leaks data). Other parts of packet may leak more data about the IP. TCP and IP checksums both refer to the same data, so can get more bits of information out of using both checksums than just using one checksum. Also, one can usually determine the OS from the TTL field and ports in a packet header. If we get hundreds of possible results (16x each masked nibble that is unknown), one can do other things to narrow the results, such as look at packet contents for domain or geo information. With hundreds of results, can import as CSV format into a spreadsheet. Can corelate with geo data and see where each possibility is located. Eric then demoed a real email report with a masked IP packet attached. Was able to find the exact IP address, given the geo and university of the sender. Point is if you're going to mask a packet, do it right. Eric wouldn't usually bother, but do it correctly if at all, to not create a false impression of security. Adventures with weird machines thirty years after "Reflections on Trusting Trust" Sergey Bratus Sergey Bratus, Dartmouth College (and Julian Bangert and Rebecca Shapiro, not present) "Reflections on Trusting Trust" refers to Ken Thompson's classic 1984 paper. "You can't trust code that you did not totally create yourself." There's invisible links in the chain-of-trust, such as "well-installed microcode bugs" or in the compiler, and other planted bugs. Thompson showed how a compiler can introduce and propagate bugs in unmodified source. But suppose if there's no bugs and you trust the author, can you trust the code? Hell No! There's too many factors—it's Babylonian in nature. Why not? Well, Input is not well-defined/recognized (code's assumptions about "checked" input will be violated (bug/vunerabiliy). For example, HTML is recursive, but Regex checking is not recursive. Input well-formed but so complex there's no telling what it does For example, ELF file parsing is complex and has multiple ways of parsing. Input is seen differently by different pieces of program or toolchain Any Input is a program input executes on input handlers (drives state changes & transitions) only a well-defined execution model can be trusted (regex/DFA, PDA, CFG) Input handler either is a "recognizer" for the inputs as a well-defined language (see langsec.org) or it's a "virtual machine" for inputs to drive into pwn-age ELF ABI (UNIX/Linux executible file format) case study. Problems can arise from these steps (without planting bugs): compiler linker loader ld.so/rtld relocator DWARF (debugger info) exceptions The problem is you can't really automatically analyze code (it's the "halting problem" and undecidable). Only solution is to freeze code and sign it. But you can't freeze everything! Can't freeze ASLR or loading—must have tables and metadata. Any sufficiently complex input data is the same as VM byte code Example, ELF relocation entries + dynamic symbols == a Turing Complete Machine (TM). @bxsays created a Turing machine in Linux from relocation data (not code) in an ELF file. For more information, see Rebecca "bx" Shapiro's presentation from last year's Toorcon, "Programming Weird Machines with ELF Metadata" @bxsays did same thing with Mach-O bytecode Or a DWARF exception handling data .eh_frame + glibc == Turning Machine X86 MMU (IDT, GDT, TSS): used address translation to create a Turning Machine. Page handler reads and writes (on page fault) memory. Uses a page table, which can be used as Turning Machine byte code. Example on Github using this TM that will fly a glider across the screen Next Sergey talked about "Parser Differentials". That having one input format, but two parsers, will create confusion and opportunity for exploitation. For example, CSRs are parsed during creation by cert requestor and again by another parser at the CA. Another example is ELF—several parsers in OS tool chain, which are all different. Can have two different Program Headers (PHDRs) because ld.so parses multiple PHDRs. The second PHDR can completely transform the executable. This is described in paper in the first issue of International Journal of PoC. Conclusions trusting computers not only about bugs! Bugs are part of a problem, but no by far all of it complex data formats means bugs no "chain of trust" in Babylon! (that is, with parser differentials) we need to squeeze complexity out of data until data stops being "code equivalent" Further information See and langsec.org. USENIX WOOT 2013 (Workshop on Offensive Technologies) for "weird machines" papers and videos.

Read the article
Quick guide to Oracle IRM 11g: Classification design

- by Simon Thorpe

Quick guide to Oracle IRM 11g indexThis is the final article in the quick guide to Oracle IRM. If you've followed everything prior you will now have a fully functional and tested Information Rights Management service. It doesn't matter if you've been following the 10g or 11g guide as this next article is common to both. ContentsWhy this is the most important part... Understanding the classification and standard rights model Identifying business use cases Creating an effective IRM classification modelOne single classification across the entire businessA context for each and every possible granular use caseWhat makes a good context? Deciding on the use of roles in the context Reviewing the features and security for context roles Summary Why this is the most important part...Now the real work begins, installing and getting an IRM system running is as simple as following instructions. However to actually have an IRM technology easily protecting your most sensitive information without interfering with your users existing daily work flows and be able to scale IRM across the entire business, requires thought into how confidential documents are created, used and distributed. This article is going to give you the information you need to ask the business the right questions so that you can deploy your IRM service successfully. The IRM team here at Oracle have over 10 years of experience in helping customers and it is important you understand the following to be successful in securing access to your most confidential information. Whatever you are trying to secure, be it mergers and acquisitions information, engineering intellectual property, health care documentation or financial reports. No matter what type of user is going to access the information, be they employees, contractors or customers, there are common goals you are always trying to achieve.Securing the content at the earliest point possible and do it automatically. Removing the dependency on the user to decide to secure the content reduces the risk of mistakes significantly and therefore results a more secure deployment. K.I.S.S. (Keep It Simple Stupid) Reduce complexity in the rights/classification model. Oracle IRM lets you make changes to access to documents even after they are secured which allows you to start with a simple model and then introduce complexity once you've understood how the technology is going to be used in the business. After an initial learning period you can review your implementation and start to make informed decisions based on user feedback and administration experience. Clearly communicate to the user, when appropriate, any changes to their existing work practice. You must make every effort to make the transition to sealed content as simple as possible. For external users you must help them understand why you are securing the documents and inform them the value of the technology to both your business and them. Before getting into the detail, I must pay homage to Martin White, Vice President of client services in SealedMedia, the company Oracle acquired and who created Oracle IRM. In the SealedMedia years Martin was involved with every single customer and was key to the design of certain aspects of the IRM technology, specifically the context model we will be discussing here. Listening carefully to customers and understanding the flexibility of the IRM technology, Martin taught me all the skills of helping customers build scalable, effective and simple to use IRM deployments. No matter how well the engineering department designed the software, badly designed and poorly executed projects can result in difficult to use and manage, and ultimately insecure solutions. The advice and information that follows was born with Martin and he's still delivering IRM consulting with customers and can be found at www.thinkers.co.uk. It is from Martin and others that Oracle not only has the most advanced, scalable and usable document security solution on the market, but Oracle and their partners have the most experience in delivering successful document security solutions. Understanding the classification and standard rights model The goal of any successful IRM deployment is to balance the increase in security the technology brings without over complicating the way people use secured content and avoid a significant increase in administration and maintenance. With Oracle it is possible to automate the protection of content, deploy the desktop software transparently and use authentication methods such that users can open newly secured content initially unaware the document is any different to an insecure one. That is until of course they attempt to do something for which they don't have any rights, such as copy and paste to an insecure application or try and print. Central to achieving this objective is creating a classification model that is simple to understand and use but also provides the right level of complexity to meet the business needs. In Oracle IRM the term used for each classification is a "context". A context defines the relationship between.A group of related documents The people that use the documents The roles that these people perform The rights that these people need to perform their role The context is the key to the success of Oracle IRM. It provides the separation of the role and rights of a user from the content itself. Documents are sealed to contexts but none of the rights, user or group information is stored within the content itself. Sealing only places information about the location of the IRM server that sealed it, the context applied to the document and a few other pieces of metadata that pertain only to the document. This important separation of rights from content means that millions of documents can be secured against a single classification and a user needs only one right assigned to be able to access all documents. If you have followed all the previous articles in this guide, you will be ready to start defining contexts to which your sensitive information will be protected. But before you even start with IRM, you need to understand how your own business uses and creates sensitive documents and emails. Identifying business use cases Oracle is able to support multiple classification systems, but usually there is one single initial need for the technology which drives a deployment. This need might be to protect sensitive mergers and acquisitions information, engineering intellectual property, financial documents. For this and every subsequent use case you must understand how users create and work with documents, to who they are distributed and how the recipients should interact with them. A successful IRM deployment should start with one well identified use case (we go through some examples towards the end of this article) and then after letting this use case play out in the business, you learn how your users work with content, how well your communication to the business worked and if the classification system you deployed delivered the right balance. It is at this point you can start rolling the technology out further. Creating an effective IRM classification model Once you have selected the initial use case you will address with IRM, you need to design a classification model that defines the access to secured documents within the use case. In Oracle IRM there is an inbuilt classification system called the "context" model. In Oracle IRM 11g it is possible to extend the server to support any rights classification model, but the majority of users who are not using an application integration (such as Oracle IRM within Oracle Beehive) are likely to be starting out with the built in context model. Before looking at creating a classification system with IRM, it is worth reviewing some recognized standards and methods for creating and implementing security policy. A very useful set of documents are the ISO 17799 guidelines and the SANS security policy templates. First task is to create a context against which documents are to be secured. A context consists of a group of related documents (all top secret engineering research), a list of roles (contributors and readers) which define how users can access documents and a list of users (research engineers) who have been given a role allowing them to interact with sealed content. Before even creating the first context it is wise to decide on a philosophy which will dictate the level of granularity, the question is, where do you start? At a department level? By project? By technology? First consider the two ends of the spectrum... One single classification across the entire business Imagine that instead of having separate contexts, one for engineering intellectual property, one for your financial data, one for human resources personally identifiable information, you create one context for all documents across the entire business. Whilst you may have immediate objections, there are some significant benefits in thinking about considering this. Document security classification decisions are simple. You only have one context to chose from! User provisioning is simple, just make sure everyone has a role in the only context in the business. Administration is very low, if you assign rights to groups from the business user repository you probably never have to touch IRM administration again. There are however some obvious downsides to this model.All users in have access to all IRM secured content. So potentially a sales person could access sensitive mergers and acquisition documents, if they can get their hands on a copy that is. You cannot delegate control of different documents to different parts of the business, this may not satisfy your regulatory requirements for the separation and delegation of duties. Changing a users role affects every single document ever secured. Even though it is very unlikely a business would ever use one single context to secure all their sensitive information, thinking about this scenario raises one very important point. Just having one single context and securing all confidential documents to it, whilst incurring some of the problems detailed above, has one huge value. Once secured, IRM protected content can ONLY be accessed by authorized users. Just think of all the sensitive documents in your business today, imagine if you could ensure that only everyone you trust could open them. Even if an employee lost a laptop or someone accidentally sent an email to the wrong recipient, only the right people could open that file. A context for each and every possible granular use case Now let's think about the total opposite of a single context design. What if you created a context for each and every single defined business need and created multiple contexts within this for each level of granularity? Let's take a use case where we need to protect engineering intellectual property. Imagine we have 6 different engineering groups, and in each we have a research department, a design department and manufacturing. The company information security policy defines 3 levels of information sensitivity... restricted, confidential and top secret. Then let's say that each group and department needs to define access to information from both internal and external users. Finally add into the mix that they want to review the rights model for each context every financial quarter. This would result in a huge amount of contexts. For example, lets just look at the resulting contexts for one engineering group. Q1FY2010 Restricted Internal - Engineering Group 1 - Research Q1FY2010 Restricted Internal - Engineering Group 1 - Design Q1FY2010 Restricted Internal - Engineering Group 1 - Manufacturing Q1FY2010 Restricted External- Engineering Group 1 - Research Q1FY2010 Restricted External - Engineering Group 1 - Design Q1FY2010 Restricted External - Engineering Group 1 - Manufacturing Q1FY2010 Confidential Internal - Engineering Group 1 - Research Q1FY2010 Confidential Internal - Engineering Group 1 - Design Q1FY2010 Confidential Internal - Engineering Group 1 - Manufacturing Q1FY2010 Confidential External - Engineering Group 1 - Research Q1FY2010 Confidential External - Engineering Group 1 - Design Q1FY2010 Confidential External - Engineering Group 1 - Manufacturing Q1FY2010 Top Secret Internal - Engineering Group 1 - Research Q1FY2010 Top Secret Internal - Engineering Group 1 - Design Q1FY2010 Top Secret Internal - Engineering Group 1 - Manufacturing Q1FY2010 Top Secret External - Engineering Group 1 - Research Q1FY2010 Top Secret External - Engineering Group 1 - Design Q1FY2010 Top Secret External - Engineering Group 1 - Manufacturing Now multiply the above by 6 for each engineering group, 18 contexts. You are then creating/reviewing another 18 every 3 months. After a year you've got 72 contexts. What would be the advantages of such a complex classification model? You can satisfy very granular rights requirements, for example only an authorized engineering group 1 researcher can create a top secret report for access internally, and his role will be reviewed on a very frequent basis. Your business may have very complex rights requirements and mapping this directly to IRM may be an obvious exercise. The disadvantages of such a classification model are significant...Huge administrative overhead. Someone in the business must manage, review and administrate each of these contexts. If the engineering group had a single administrator, they would have 72 classifications to reside over each year. From an end users perspective life will be very confusing. Imagine if a user has rights in just 6 of these contexts. They may be able to print content from one but not another, be able to edit content in 2 contexts but not the other 4. Such confusion at the end user level causes frustration and resistance to the use of the technology. Increased synchronization complexity. Imagine a user who after 3 years in the company ends up with over 300 rights in many different contexts across the business. This would result in long synchronization times as the client software updates all your offline rights. Hard to understand who can do what with what. Imagine being the VP of engineering and as part of an internal security audit you are asked the question, "What rights to researchers have to our top secret information?". In this complex model the answer is not simple, it would depend on many roles in many contexts. Of course this example is extreme, but it highlights that trying to build many barriers in your business can result in a nightmare of administration and confusion amongst users. In the real world what we need is a balance of the two. We need to seek an optimum number of contexts. Too many contexts are unmanageable and too few contexts does not give fine enough granularity. What makes a good context? Good context design derives mainly from how well you understand your business requirements to secure access to confidential information. Some customers I have worked with can tell me exactly the documents they wish to secure and know exactly who should be opening them. However there are some customers who know only of the government regulation that requires them to control access to certain types of information, they don't actually know where the documents are, how they are created or understand exactly who should have access. Therefore you need to know how to ask the business the right questions that lead to information which help you define a context. First ask these questions about a set of documentsWhat is the topic? Who are legitimate contributors on this topic? Who are the authorized readership? If the answer to any one of these is significantly different, then it probably merits a separate context. Remember that sealed documents are inherently secure and as such they cannot leak to your competitors, therefore it is better sealed to a broad context than not sealed at all. Simplicity is key here. Always revert to the first extreme example of a single classification, then work towards essential complexity. If there is any doubt, always prefer fewer contexts. Remember, Oracle IRM allows you to change your mind later on. You can implement a design now and continue to change and refine as you learn how the technology is used. It is easy to go from a simple model to a more complex one, it is much harder to take a complex model that is already embedded in the work practice of users and try to simplify it. It is also wise to take a single use case and address this first with the business. Don't try and tackle many different problems from the outset. Do one, learn from the process, refine it and then take what you have learned into the next use case, refine and continue. Once you have a good grasp of the technology and understand how your business will use it, you can then start rolling out the technology wider across the business. Deciding on the use of roles in the context Once you have decided on that first initial use case and a context to create let's look at the details you need to decide upon. For each context, identify; Administrative rolesBusiness owner, the person who makes decisions about who may or may not see content in this context. This is often the person who wanted to use IRM and drove the business purchase. They are the usually the person with the most at risk when sensitive information is lost. Point of contact, the person who will handle requests for access to content. Sometimes the same as the business owner, sometimes a trusted secretary or administrator. Context administrator, the person who will enact the decisions of the Business Owner. Sometimes the point of contact, sometimes a trusted IT person. Document related rolesContributors, the people who create and edit documents in this context. Reviewers, the people who are involved in reviewing documents but are not trusted to secure information to this classification. This role is not always necessary. (See later discussion on Published-work and Work-in-Progress) Readers, the people who read documents from this context. Some people may have several of the roles above, which is fine. What you are trying to do is understand and define how the business interacts with your sensitive information. These roles obviously map directly to roles available in Oracle IRM. Reviewing the features and security for context roles At this point we have decided on a classification of information, understand what roles people in the business will play when administrating this classification and how they will interact with content. The final piece of the puzzle in getting the information for our first context is to look at the permissions people will have to sealed documents. First think why are you protecting the documents in the first place? It is to prevent the loss of leaking of information to the wrong people. To control the information, making sure that people only access the latest versions of documents. You are not using Oracle IRM to prevent unauthorized people from doing legitimate work. This is an important point, with IRM you can erect many barriers to prevent access to content yet too many restrictions and authorized users will often find ways to circumvent using the technology and end up distributing unprotected originals. Because IRM is a security technology, it is easy to get carried away restricting different groups. However I would highly recommend starting with a simple solution with few restrictions. Ensure that everyone who reasonably needs to read documents can do so from the outset. Remember that with Oracle IRM you can change rights to content whenever you wish and tighten security. Always return to the fact that the greatest value IRM brings is that ONLY authorized users can access secured content, remember that simple "one context for the entire business" model. At the start of the deployment you really need to aim for user acceptance and therefore a simple model is more likely to succeed. As time passes and users understand how IRM works you can start to introduce more restrictions and complexity. Another key aspect to focus on is handling exceptions. If you decide on a context model where engineering can only access engineering information, and sales can only access sales data. Act quickly when a sales manager needs legitimate access to a set of engineering documents. Having a quick and effective process for permitting other people with legitimate needs to obtain appropriate access will be rewarded with acceptance from the user community. These use cases can often be satisfied by integrating IRM with a good Identity & Access Management technology which simplifies the process of assigning users the correct business roles. The big print issue... Printing is often an issue of contention, users love to print but the business wants to ensure sensitive information remains in the controlled digital world. There are many cases of physical document loss causing a business pain, it is often overlooked that IRM can help with this issue by limiting the ability to generate physical copies of digital content. However it can be hard to maintain a balance between security and usability when it comes to printing. Consider the following points when deciding about whether to give print rights. Oracle IRM sealed documents can contain watermarks that expose information about the user, time and location of access and the classification of the document. This information would reside in the printed copy making it easier to trace who printed it. Printed documents are slower to distribute in comparison to their digital counterparts, so time sensitive information in printed format may present a lower risk. Print activity is audited, therefore you can monitor and react to users abusing print rights. Summary In summary it is important to think carefully about the way you create your context model. As you ask the business these questions you may get a variety of different requirements. There may be special projects that require a context just for sensitive information created during the lifetime of the project. There may be a department that requires all information in the group is secured and you might have a few senior executives who wish to use IRM to exchange a small number of highly sensitive documents with a very small number of people. Oracle IRM, with its very flexible context classification system, can support all of these use cases. The trick is to introducing the complexity to deliver them at the right level. In another article i'm working on I will go through some examples of how Oracle IRM might map to existing business use cases. But for now, this article covers all the important questions you need to get your IRM service deployed and successfully protecting your most sensitive information.

Read the article
Setting up a local AI server - easy with Solaris 11

- by Stefan Hinker

Many things are new in Solaris 11, Autoinstall is one of them. If, like me, you've known Jumpstart for the last 2 centuries or so, you'll have to start from scratch. Well, almost, as the concepts are similar, and it's not all that difficult. Just new. I wanted to have an AI server that I could use for demo purposes, on the train if need be. That answers the question of hardware requirements: portable. But let's start at the beginning. First, you need an OS image, of course. In the new world of Solaris 11, it is now called a repository. The original can be downloaded from the Solaris 11 page at Oracle. What you want is the "Oracle Solaris 11 11/11 Repository Image", which comes in two parts that can be combined using cat. MD5 checksums for these (and all other downloads from that page) are available closer to the top of the page. With that, building the repository is quick and simple: # zfs create -o mountpoint=/export/repo rpool/ai/repo # zfs create rpool/ai/repo/s11 # mount -o ro -F hsfs /tmp/sol-11-1111-repo-full.iso /mnt # rsync -aP /mnt/repo /export/repo/s11 # umount /mnt # pkgrepo rebuild -s /export/repo/sol11/repo # zfs snapshot rpool/ai/repo/sol11@fcs # pkgrepo info -s /export/repo/sol11/repo PUBLISHER PACKAGES STATUS UPDATED solaris 4292 online 2012-03-12T20:47:15.378639Z That's all there's to it. Let's make a snapshot, just to be on the safe side. You never know when one will come in handy. To use this repository, you could just add it as a file-based publisher: # pkg set-publisher -g file:///export/repo/sol11/repo solaris In case I'd want to access this repository through a (virtual) network, i'll now quickly activate the repository-service: # svccfg -s application/pkg/server \ setprop pkg/inst_root=/export/repo/sol11/repo # svccfg -s application/pkg/server setprop pkg/readonly=true # svcadm refresh application/pkg/server # svcadm enable application/pkg/server That's all you need - now point your browser to http://localhost/ to view your beautiful repository-server. Step 1 is done. All of this, by the way, is nicely documented in the README file that's contained in the repository image. Of course, we already have updates to the original release. You can find them in MOS in the Oracle Solaris 11 Support Repository Updates (SRU) Index. You can simply add these to your existing repository or create separate repositories for each SRU. The individual SRUs are self-sufficient and incremental - SRU4 includes all updates from SRU2 and SRU3. With ZFS, you can also get both: A full repository with all updates and at the same time incremental ones up to each of the updates: # mount -o ro -F hsfs /tmp/sol-11-1111-sru4-05-incr-repo.iso /mnt # pkgrecv -s /mnt/repo -d /export/repo/sol11/repo '*' # umount /mnt # pkgrepo rebuild -s /export/repo/sol11/repo # zfs snapshot rpool/ai/repo/sol11@sru4 # zfs set snapdir=visible rpool/ai/repo/sol11 # svcadm restart svc:/application/pkg/server:default The normal repository is now updated to SRU4. Thanks to the ZFS snapshots, there is also a valid repository of Solaris 11 11/11 without the update located at /export/repo/sol11/.zfs/snapshot/fcs . If you like, you can also create another repository service for each update, running on a separate port. But now lets continue with the AI server. Just a little bit of reading in the dokumentation makes it clear that we will need to run a DHCP server for this. Since I already have one active (for my SunRay installation) and since it's a good idea to have these kinds of services separate anyway, I decided to create this in a Zone. So, let's create one first: # zfs create -o mountpoint=/export/install rpool/ai/install # zfs create -o mountpoint=/zones rpool/zones # zonecfg -z ai-server zonecfg:ai-server> create create: Using system default template 'SYSdefault' zonecfg:ai-server> set zonepath=/zones/ai-server zonecfg:ai-server> add dataset zonecfg:ai-server:dataset> set name=rpool/ai/install zonecfg:ai-server:dataset> set alias=install zonecfg:ai-server:dataset> end zonecfg:ai-server> commit zonecfg:ai-server> exit # zoneadm -z ai-server install # zoneadm -z ai-server boot ; zlogin -C ai-server Give it a hostname and IP address at first boot, and there's the Zone. For a publisher for Solaris packages, it will be bound to the "System Publisher" from the Global Zone. The /export/install filesystem, of course, is intended to be used by the AI server. Let's configure it now: #zlogin ai-server root@ai-server:~# pkg install install/installadm root@ai-server:~# installadm create-service -n x86-fcs -a i386 \ -s pkg://solaris/install-image/[email protected],5.11-0.175.0.0.0.2.1482 \ -d /export/install/fcs -i 192.168.2.20 -c 3 With that, the core AI server is already done. What happened here? First, I installed the AI server software. IPS makes that nice and easy. If necessary, it'll also pull in the required DHCP-Server and anything else that might be missing. Watch out for that DHCP server software. In Solaris 11, there are two different versions. There's the one you might know from Solaris 10 and earlier, and then there's a new one from ISC. The latter is the one we need for AI. The SMF service names of both are very similar. The "old" one is "svc:/network/dhcp-server:default". The ISC-server comes with several SMF-services. We at least need "svc:/network/dhcp/server:ipv4". The command "installadm create-service" creates the installation-service. It's called "x86-fcs", serves the "i386" architecture and gets its boot image from the repository of the system publisher, using version 5.11,5.11-0.175.0.0.0.2.1482, which is Solaris 11 11/11. (The option "-a i386" in this example is optional, since the installserver itself runs on a x86 machine.) The boot-environment for clients is created in /export/install/fcs and the DHCP-server is configured for 3 IP-addresses starting at 192.168.2.20. This configuration is stored in a very human readable form in /etc/inet/dhcpd4.conf. An AI-service for SPARC systems could be created in the very same way, using "-a sparc" as the architecture option. Now we would be ready to register and install the first client. It would be installed with the default "solaris-large-server" using the publisher "http://pkg.oracle.com/solaris/release" and would query it's configuration interactively at first boot. This makes it very clear that an AI-server is really only a boot-server. The true source of packets to install can be different. Since I don't like these defaults for my demo setup, I did some extra config work for my clients. The configuration of a client is controlled by manifests and profiles. The manifest controls which packets are installed and how the filesystems are layed out. In that, it's very much like the old "rules.ok" file in Jumpstart. Profiles contain additional configuration like root passwords, primary user account, IP addresses, keyboard layout etc. Hence, profiles are very similar to the old sysid.cfg file. The easiest way to get your hands on a manifest is to ask the AI server we just created to give us it's default one. Then modify that to our liking and give it back to the installserver to use: root@ai-server:~# mkdir -p /export/install/configs/manifests root@ai-server:~# cd /export/install/configs/manifests root@ai-server:~# installadm export -n x86-fcs -m orig_default \ -o orig_default.xml root@ai-server:~# cp orig_default.xml s11-fcs.small.local.xml root@ai-server:~# vi s11-fcs.small.local.xml root@ai-server:~# more s11-fcs.small.local.xml <!DOCTYPE auto_install SYSTEM "file:///usr/share/install/ai.dtd.1"> <auto_install> <ai_instance name="S11 Small fcs local"> <target> <logical> <zpool name="rpool" is_root="true"> <filesystem name="export" mountpoint="/export"/> <filesystem name="export/home"/> <be name="solaris"/> </zpool> </logical> </target> <software type="IPS"> <destination> <image>  <facet set="false">facet.locale.*</facet> <facet set="true">facet.locale.de</facet> <facet set="true">facet.locale.de_DE</facet> <facet set="true">facet.locale.en</facet> <facet set="true">facet.locale.en_US</facet> </image> </destination> <source> <publisher name="solaris"> <origin name="http://192.168.2.12/"/> </publisher> </source>  <software_data action="install"> <name>pkg:/[email protected],5.11-0.175.0.0.0.2.0</name> <name>pkg:/group/system/solaris-small-server</name> </software_data> </software> </ai_instance> </auto_install> root@ai-server:~# installadm create-manifest -n x86-fcs -d \ -f ./s11-fcs.small.local.xml root@ai-server:~# installadm list -m -n x86-fcs Manifest Status Criteria -------- ------ -------- S11 Small fcs local Default None orig_default Inactive None The major points in this new manifest are: Install "solaris-small-server" Install a few locales less than the default. I'm not that fluid in French or Japanese... Use my own package service as publisher, running on IP address 192.168.2.12 Install the initial release of Solaris 11: pkg:/[email protected],5.11-0.175.0.0.0.2.0 Using a similar approach, I'll create a default profile interactively and use it as a template for a few customized building blocks, each defining a part of the overall system configuration. The modular approach makes it easy to configure numerous clients later on: root@ai-server:~# mkdir -p /export/install/configs/profiles root@ai-server:~# cd /export/install/configs/profiles root@ai-server:~# sysconfig create-profile -o default.xml root@ai-server:~# cp default.xml general.xml; cp default.xml mars.xml root@ai-server:~# cp default.xml user.xml root@ai-server:~# vi general.xml mars.xml user.xml root@ai-server:~# more general.xml mars.xml user.xml :::::::::::::: general.xml :::::::::::::: <!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1"> <service_bundle type="profile" name="sysconfig"> <service version="1" type="service" name="system/timezone"> <instance enabled="true" name="default"> <property_group type="application" name="timezone"> <propval type="astring" name="localtime" value="Europe/Berlin"/> </property_group> </instance> </service> <service version="1" type="service" name="system/environment"> <instance enabled="true" name="init"> <property_group type="application" name="environment"> <propval type="astring" name="LANG" value="C"/> </property_group> </instance> </service> <service version="1" type="service" name="system/keymap"> <instance enabled="true" name="default"> <property_group type="system" name="keymap"> <propval type="astring" name="layout" value="US-English"/> </property_group> </instance> </service> <service version="1" type="service" name="system/console-login"> <instance enabled="true" name="default"> <property_group type="application" name="ttymon"> <propval type="astring" name="terminal_type" value="vt100"/> </property_group> </instance> </service> <service version="1" type="service" name="network/physical"> <instance enabled="true" name="default"> <property_group type="application" name="netcfg"> <propval type="astring" name="active_ncp" value="DefaultFixed"/> </property_group> </instance> </service> <service version="1" type="service" name="system/name-service/switch"> <property_group type="application" name="config"> <propval type="astring" name="default" value="files"/> <propval type="astring" name="host" value="files dns"/> <propval type="astring" name="printer" value="user files"/> </property_group> <instance enabled="true" name="default"/> </service> <service version="1" type="service" name="system/name-service/cache"> <instance enabled="true" name="default"/> </service> <service version="1" type="service" name="network/dns/client"> <property_group type="application" name="config"> <property type="net_address" name="nameserver"> <net_address_list> <value_node value="192.168.2.1"/> </net_address_list> </property> </property_group> <instance enabled="true" name="default"/> </service> </service_bundle> :::::::::::::: mars.xml :::::::::::::: <!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1"> <service_bundle type="profile" name="sysconfig"> <service version="1" type="service" name="network/install"> <instance enabled="true" name="default"> <property_group type="application" name="install_ipv4_interface"> <propval type="astring" name="address_type" value="static"/> <propval type="net_address_v4" name="static_address" value="192.168.2.100/24"/> <propval type="astring" name="name" value="net0/v4"/> <propval type="net_address_v4" name="default_route" value="192.168.2.1"/> </property_group> <property_group type="application" name="install_ipv6_interface"> <propval type="astring" name="stateful" value="yes"/> <propval type="astring" name="stateless" value="yes"/> <propval type="astring" name="address_type" value="addrconf"/> <propval type="astring" name="name" value="net0/v6"/> </property_group> </instance> </service> <service version="1" type="service" name="system/identity"> <instance enabled="true" name="node"> <property_group type="application" name="config"> <propval type="astring" name="nodename" value="mars"/> </property_group> </instance> </service> </service_bundle> :::::::::::::: user.xml :::::::::::::: <!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1"> <service_bundle type="profile" name="sysconfig"> <service version="1" type="service" name="system/config-user"> <instance enabled="true" name="default"> <property_group type="application" name="root_account"> <propval type="astring" name="login" value="root"/> <propval type="astring" name="password" value="noIWillNotTellYouMyPasswordNotEvenEncrypted"/> <propval type="astring" name="type" value="role"/> </property_group> <property_group type="application" name="user_account"> <propval type="astring" name="login" value="stefan"/> <propval type="astring" name="password" value="noIWillNotTellYouMyPasswordNotEvenEncrypted"/> <propval type="astring" name="type" value="normal"/> <propval type="astring" name="description" value="Stefan Hinker"/> <propval type="count" name="uid" value="12345"/> <propval type="count" name="gid" value="10"/> <propval type="astring" name="shell" value="/usr/bin/bash"/> <propval type="astring" name="roles" value="root"/> <propval type="astring" name="profiles" value="System Administrator"/> <propval type="astring" name="sudoers" value="ALL=(ALL) ALL"/> </property_group> </instance> </service> </service_bundle> root@ai-server:~# installadm create-profile -n x86-fcs -f general.xml root@ai-server:~# installadm create-profile -n x86-fcs -f user.xml root@ai-server:~# installadm create-profile -n x86-fcs -f mars.xml \ -c ipv4=192.168.2.100 root@ai-server:~# installadm list -p Service Name Profile ------------ ------- x86-fcs general.xml mars.xml user.xml root@ai-server:~# installadm list -n x86-fcs -p Profile Criteria ------- -------- general.xml None mars.xml ipv4 = 192.168.2.100 user.xml None Here's the idea behind these files: "general.xml" contains settings valid for all my clients. Stuff like DNS servers, for example, which in my case will always be the same. "user.xml" only contains user definitions. That is, a root password and a primary user.Both of these profiles will be valid for all clients (for now). "mars.xml" defines network settings for an individual client. This profile is associated with an IP-Address. For this to work, I'll have to tweak the DHCP-settings in the next step: root@ai-server:~# installadm create-client -e 08:00:27:AA:3D:B1 -n x86-fcs root@ai-server:~# vi /etc/inet/dhcpd4.conf root@ai-server:~# tail -5 /etc/inet/dhcpd4.conf host 080027AA3DB1 { hardware ethernet 08:00:27:AA:3D:B1; fixed-address 192.168.2.100; filename "01080027AA3DB1"; } This completes the client preparations. I manually added the IP-Address for mars to /etc/inet/dhcpd4.conf. This is needed for the "mars.xml" profile. Disabling arbitrary DHCP-replies will shut up this DHCP server, making my life in a shared environment a lot more peaceful ;-)Now, I of course want this installation to be completely hands-off. For this to work, I'll need to modify the grub boot menu for this client slightly. You can find it in /etc/netboot. "installadm create-client" will create a new boot menu for every client, identified by the client's MAC address. The template for this can be found in a subdirectory with the name of the install service, /etc/netboot/x86-fcs in our case. If you don't want to change this manually for every client, modify that template to your liking instead. root@ai-server:~# cd /etc/netboot root@ai-server:~# cp menu.lst.01080027AA3DB1 menu.lst.01080027AA3DB1.org root@ai-server:~# vi menu.lst.01080027AA3DB1 root@ai-server:~# diff menu.lst.01080027AA3DB1 menu.lst.01080027AA3DB1.org 1,2c1,2 < default=1 < timeout=10 --- > default=0 > timeout=30 root@ai-server:~# more menu.lst.01080027AA3DB1 default=1 timeout=10 min_mem64=0 title Oracle Solaris 11 11/11 Text Installer and command line kernel$ /x86-fcs/platform/i86pc/kernel/$ISADIR/unix -B install_media=htt p://$serverIP:5555//export/install/fcs,install_service=x86-fcs,install_svc_addre ss=$serverIP:5555 module$ /x86-fcs/platform/i86pc/$ISADIR/boot_archive title Oracle Solaris 11 11/11 Automated Install kernel$ /x86-fcs/platform/i86pc/kernel/$ISADIR/unix -B install=true,inst all_media=http://$serverIP:5555//export/install/fcs,install_service=x86-fcs,inst all_svc_address=$serverIP:5555,livemode=text module$ /x86-fcs/platform/i86pc/$ISADIR/boot_archive Now just boot the client off the network using PXE-boot. For my demo purposes, that's a client from VirtualBox, of course. That's all there's to it. And despite the fact that this blog entry is a little longer - that wasn't that hard now, was it?

Read the article
Basic Spatial Data with SQL Server and Entity Framework 5.0

- by Rick Strahl

In my most recent project we needed to do a bit of geo-spatial referencing. While spatial features have been in SQL Server for a while using those features inside of .NET applications hasn't been as straight forward as could be, because .NET natively doesn't support spatial types. There are workarounds for this with a few custom project like SharpMap or a hack using the Sql Server specific Geo types found in the Microsoft.SqlTypes assembly that ships with SQL server. While these approaches work for manipulating spatial data from .NET code, they didn't work with database access if you're using Entity Framework. Other ORM vendors have been rolling their own versions of spatial integration. In Entity Framework 5.0 running on .NET 4.5 the Microsoft ORM finally adds support for spatial types as well. In this post I'll describe basic geography features that deal with single location and distance calculations which is probably the most common usage scenario. SQL Server Transact-SQL Syntax for Spatial Data Before we look at how things work with Entity framework, lets take a look at how SQL Server allows you to use spatial data to get an understanding of the underlying semantics. The following SQL examples should work with SQL 2008 and forward. Let's start by creating a test table that includes a Geography field and also a pair of Long/Lat fields that demonstrate how you can work with the geography functions even if you don't have geography/geometry fields in the database. Here's the CREATE command:CREATE TABLE [dbo].[Geo]( [id] [int] IDENTITY(1,1) NOT NULL, [Location] [geography] NULL, [Long] [float] NOT NULL, [Lat] [float] NOT NULL ) Now using plain SQL you can insert data into the table using geography::STGeoFromText SQL CLR function:insert into Geo( Location , long, lat ) values ( geography::STGeomFromText ('POINT(-121.527200 45.712113)', 4326), -121.527200, 45.712113 ) insert into Geo( Location , long, lat ) values ( geography::STGeomFromText ('POINT(-121.517265 45.714240)', 4326), -121.517265, 45.714240 ) insert into Geo( Location , long, lat ) values ( geography::STGeomFromText ('POINT(-121.511536 45.714825)', 4326), -121.511536, 45.714825) The STGeomFromText function accepts a string that points to a geometric item (a point here but can also be a line or path or polygon and many others). You also need to provide an SRID (Spatial Reference System Identifier) which is an integer value that determines the rules for how geography/geometry values are calculated and returned. For mapping/distance functionality you typically want to use 4326 as this is the format used by most mapping software and geo-location libraries like Google and Bing. The spatial data in the Location field is stored in binary format which looks something like this: Once the location data is in the database you can query the data and do simple distance computations very easily. For example to calculate the distance of each of the values in the database to another spatial point is very easy to calculate. Distance calculations compare two points in space using a direct line calculation. For our example I'll compare a new point to all the points in the database. Using the Location field the SQL looks like this:-- create a source point DECLARE @s geography SET @s = geography:: STGeomFromText('POINT(-121.527200 45.712113)' , 4326); --- return the ids select ID, Location as Geo , Location .ToString() as Point , @s.STDistance( Location) as distance from Geo order by distance The code defines a new point which is the base point to compare each of the values to. You can also compare values from the database directly, but typically you'll want to match a location to another location and determine the difference for which you can use the geography::STDistance function. This query produces the following output: The STDistance function returns the straight line distance between the passed in point and the point in the database field. The result for SRID 4326 is always in meters. Notice that the first value passed was the same point so the difference is 0. The other two points are two points here in town in Hood River a little ways away - 808 and 1256 meters respectively. Notice also that you can order the result by the resulting distance, which effectively gives you results that are ordered radially out from closer to further away. This is great for searches of points of interest near a central location (YOU typically!). These geolocation functions are also available to you if you don't use the Geography/Geometry types, but plain float values. It's a little more work, as each point has to be created in the query using the string syntax, but the following code doesn't use a geography field but produces the same result as the previous query.--- using float fields select ID, geography::STGeomFromText ('POINT(' + STR (long, 15,7 ) + ' ' + Str(lat ,15, 7) + ')' , 4326), geography::STGeomFromText ('POINT(' + STR (long, 15,7 ) + ' ' + Str(lat ,15, 7) + ')' , 4326). ToString(), @s.STDistance( geography::STGeomFromText ('POINT(' + STR(long ,15, 7) + ' ' + Str(lat ,15, 7) + ')' , 4326)) as distance from geo order by distance Spatial Data in the Entity Framework Prior to Entity Framework 5.0 on .NET 4.5 consuming of the data above required using stored procedures or raw SQL commands to access the spatial data. In Entity Framework 5 however, Microsoft introduced the new DbGeometry and DbGeography types. These immutable location types provide a bunch of functionality for manipulating spatial points using geometry functions which in turn can be used to do common spatial queries like I described in the SQL syntax above. The DbGeography/DbGeometry types are immutable, meaning that you can't write to them once they've been created. They are a bit odd in that you need to use factory methods in order to instantiate them - they have no constructor() and you can't assign to properties like Latitude and Longitude. Creating a Model with Spatial Data Let's start by creating a simple Entity Framework model that includes a Location property of type DbGeography: public class GeoLocationContext : DbContext { public DbSet<GeoLocation> Locations { get; set; } } public class GeoLocation { public int Id { get; set; } public DbGeography Location { get; set; } public string Address { get; set; } } That's all there's to it. When you run this now against SQL Server, you get a Geography field for the Location property, which looks the same as the Location field in the SQL examples earlier. Adding Spatial Data to the Database Next let's add some data to the table that includes some latitude and longitude data. An easy way to find lat/long locations is to use Google Maps to pinpoint your location, then right click and click on What's Here. Click on the green marker to get the GPS coordinates. To add the actual geolocation data create an instance of the GeoLocation type and use the DbGeography.PointFromText() factory method to create a new point to assign to the Location property:[TestMethod] public void AddLocationsToDataBase() { var context = new GeoLocationContext(); // remove all context.Locations.ToList().ForEach( loc => context.Locations.Remove(loc)); context.SaveChanges(); var location = new GeoLocation() { // Create a point using native DbGeography Factory method Location = DbGeography.PointFromText( string.Format("POINT({0} {1})", -121.527200,45.712113) ,4326), Address = "301 15th Street, Hood River" }; context.Locations.Add(location); location = new GeoLocation() { Location = CreatePoint(45.714240, -121.517265), Address = "The Hatchery, Bingen" }; context.Locations.Add(location); location = new GeoLocation() { // Create a point using a helper function (lat/long) Location = CreatePoint(45.708457, -121.514432), Address = "Kaze Sushi, Hood River" }; context.Locations.Add(location); location = new GeoLocation() { Location = CreatePoint(45.722780, -120.209227), Address = "Arlington, OR" }; context.Locations.Add(location); context.SaveChanges(); } As promised, a DbGeography object has to be created with one of the static factory methods provided on the type as the Location.Longitude and Location.Latitude properties are read only. Here I'm using PointFromText() which uses a "Well Known Text" format to specify spatial data. In the first example I'm specifying to create a Point from a longitude and latitude value, using an SRID of 4326 (just like earlier in the SQL examples). You'll probably want to create a helper method to make the creation of Points easier to avoid that string format and instead just pass in a couple of double values. Here's my helper called CreatePoint that's used for all but the first point creation in the sample above:public static DbGeography CreatePoint(double latitude, double longitude) { var text = string.Format(CultureInfo.InvariantCulture.NumberFormat, "POINT({0} {1})", longitude, latitude); // 4326 is most common coordinate system used by GPS/Maps return DbGeography.PointFromText(text, 4326); } Using the helper the syntax becomes a bit cleaner, requiring only a latitude and longitude respectively. Note that my method intentionally swaps the parameters around because Latitude and Longitude is the common format I've seen with mapping libraries (especially Google Mapping/Geolocation APIs with their LatLng type). When the context is changed the data is written into the database using the SQL Geography type which looks the same as in the earlier SQL examples shown. Querying Once you have some location data in the database it's now super easy to query the data and find out the distance between locations. A common query is to ask for a number of locations that are near a fixed point - typically your current location and order it by distance. Using LINQ to Entities a query like this is easy to construct:[TestMethod] public void QueryLocationsTest() { var sourcePoint = CreatePoint(45.712113, -121.527200); var context = new GeoLocationContext(); // find any locations within 5 kilometers ordered by distance var matches = context.Locations .Where(loc => loc.Location.Distance(sourcePoint) < 5000) .OrderBy( loc=> loc.Location.Distance(sourcePoint) ) .Select( loc=> new { Address = loc.Address, Distance = loc.Location.Distance(sourcePoint) }); Assert.IsTrue(matches.Count() > 0); foreach (var location in matches) { Console.WriteLine("{0} ({1:n0} meters)", location.Address, location.Distance); } } This example produces: 301 15th Street, Hood River (0 meters)The Hatchery, Bingen (809 meters)Kaze Sushi, Hood River (1,074 meters) The first point in the database is the same as my source point I'm comparing against so the distance is 0. The other two are within the 5 mile radius, while the Arlington location which is 65 miles or so out is not returned. The result is ordered by distance from closest to furthest away. In the code, I first create a source point that is the basis for comparison. The LINQ query then selects all locations that are within 5km of the source point using the Location.Distance() function, which takes a source point as a parameter. You can either use a pre-defined value as I'm doing here, or compare against another database DbGeography property (say when you have to points in the same database for things like routes). What's nice about this query syntax is that it's very clean and easy to read and understand. You can calculate the distance and also easily order by the distance to provide a result that shows locations from closest to furthest away which is a common scenario for any application that places a user in the context of several locations. It's now super easy to accomplish this. Meters vs. Miles As with the SQL Server functions, the Distance() method returns data in meters, so if you need to work with miles or feet you need to do some conversion. Here are a couple of helpers that might be useful (can be found in GeoUtils.cs of the sample project):/// <summary> /// Convert meters to miles /// </summary> /// <param name="meters"></param> /// <returns></returns> public static double MetersToMiles(double? meters) { if (meters == null) return 0F; return meters.Value * 0.000621371192; } /// <summary> /// Convert miles to meters /// </summary> /// <param name="miles"></param> /// <returns></returns> public static double MilesToMeters(double? miles) { if (miles == null) return 0; return miles.Value * 1609.344; } Using these two helpers you can query on miles like this:[TestMethod] public void QueryLocationsMilesTest() { var sourcePoint = CreatePoint(45.712113, -121.527200); var context = new GeoLocationContext(); // find any locations within 5 miles ordered by distance var fiveMiles = GeoUtils.MilesToMeters(5); var matches = context.Locations .Where(loc => loc.Location.Distance(sourcePoint) <= fiveMiles) .OrderBy(loc => loc.Location.Distance(sourcePoint)) .Select(loc => new { Address = loc.Address, Distance = loc.Location.Distance(sourcePoint) }); Assert.IsTrue(matches.Count() > 0); foreach (var location in matches) { Console.WriteLine("{0} ({1:n1} miles)", location.Address, GeoUtils.MetersToMiles(location.Distance)); } } which produces: 301 15th Street, Hood River (0.0 miles)The Hatchery, Bingen (0.5 miles)Kaze Sushi, Hood River (0.7 miles) Nice 'n simple. .NET 4.5 Only Note that DbGeography and DbGeometry are exclusive to Entity Framework 5.0 (not 4.4 which ships in the same NuGet package or installer) and requires .NET 4.5. That's because the new DbGeometry and DbGeography (and related) types are defined in the 4.5 version of System.Data.Entity which is a CLR assembly and is only updated by major versions of .NET. Why this decision was made to add these types to System.Data.Entity rather than to the frequently updated EntityFramework assembly that would have possibly made this work in .NET 4.0 is beyond me, especially given that there are no native .NET framework spatial types to begin with. I find it also odd that there is no native CLR spatial type. The DbGeography and DbGeometry types are specific to Entity Framework and live on those assemblies. They will also work for general purpose, non-database spatial data manipulation, but then you are forced into having a dependency on System.Data.Entity, which seems a bit silly. There's also a System.Spatial assembly that's apparently part of WCF Data Services which in turn don't work with Entity framework. Another example of multiple teams at Microsoft not communicating and implementing the same functionality (differently) in several different places. Perplexed as a I may be, for EF specific code the Entity framework specific types are easy to use and work well. Working with pre-.NET 4.5 Entity Framework and Spatial Data If you can't go to .NET 4.5 just yet you can also still use spatial features in Entity Framework, but it's a lot more work as you can't use the DbContext directly to manipulate the location data. You can still run raw SQL statements to write data into the database and retrieve results using the same TSQL syntax I showed earlier using Context.Database.ExecuteSqlCommand(). Here's code that you can use to add location data into the database:[TestMethod] public void RawSqlEfAddTest() { string sqlFormat = @"insert into GeoLocations( Location, Address) values ( geography::STGeomFromText('POINT({0} {1})', 4326),@p0 )"; var sql = string.Format(sqlFormat,-121.527200, 45.712113); Console.WriteLine(sql); var context = new GeoLocationContext(); Assert.IsTrue(context.Database.ExecuteSqlCommand(sql,"301 N. 15th Street") > 0); } Here I'm using the STGeomFromText() function to add the location data. Note that I'm using string.Format here, which usually would be a bad practice but is required here. I was unable to use ExecuteSqlCommand() and its named parameter syntax as the longitude and latitude parameters are embedded into a string. Rest assured it's required as the following does not work:string sqlFormat = @"insert into GeoLocations( Location, Address) values ( geography::STGeomFromText('POINT(@p0 @p1)', 4326),@p2 )";context.Database.ExecuteSqlCommand(sql, -121.527200, 45.712113, "301 N. 15th Street") Explicitly assigning the point value with string.format works however. There are a number of ways to query location data. You can't get the location data directly, but you can retrieve the point string (which can then be parsed to get Latitude and Longitude) and you can return calculated values like distance. Here's an example of how to retrieve some geo data into a resultset using EF's and SqlQuery method:[TestMethod] public void RawSqlEfQueryTest() { var sqlFormat = @" DECLARE @s geography SET @s = geography:: STGeomFromText('POINT({0} {1})' , 4326); SELECT Address, Location.ToString() as GeoString, @s.STDistance( Location) as Distance FROM GeoLocations ORDER BY Distance"; var sql = string.Format(sqlFormat, -121.527200, 45.712113); var context = new GeoLocationContext(); var locations = context.Database.SqlQuery<ResultData>(sql); Assert.IsTrue(locations.Count() > 0); foreach (var location in locations) { Console.WriteLine(location.Address + " " + location.GeoString + " " + location.Distance); } } public class ResultData { public string GeoString { get; set; } public double Distance { get; set; } public string Address { get; set; } } Hopefully you don't have to resort to this approach as it's fairly limited. Using the new DbGeography/DbGeometry types makes this sort of thing so much easier. When I had to use code like this before I typically ended up retrieving data pks only and then running another query with just the PKs to retrieve the actual underlying DbContext entities. This was very inefficient and tedious but it did work. Summary For the current project I'm working on we actually made the switch to .NET 4.5 purely for the spatial features in EF 5.0. This app heavily relies on spatial queries and it was worth taking a chance with pre-release code to get this ease of integration as opposed to manually falling back to stored procedures or raw SQL string queries to return spatial specific queries. Using native Entity Framework code makes life a lot easier than the alternatives. It might be a late addition to Entity Framework, but it sure makes location calculations and storage easy. Where do you want to go today? ;-) Resources Download Sample Project© Rick Strahl, West Wind Technologies, 2005-2012Posted in ADO.NET Sql Server .NET Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article
Tips on Migrating from AquaLogic .NET Accelerator to WebCenter WSRP Producer for .NET

- by user647124

This year I embarked on a journey to migrate a group of ASP.NET web applications developed to integrate with WebLogic Portal 9.2 via the AquaLogic® Interaction .NET Application Accelerator 1.0 to instead use the Oracle WebCenter WSRP Producer for .NET and integrated with WebLogic Portal 10.3.4. It has been a very winding path and this blog entry is intended to share both the lessons learned and relevant approaches that led to those learnings. Like most journeys of discovery, it was not a direct path, and there are notes to let you know when it is practical to skip a section if you are in a hurry to get from here to there. For the Curious From the perspective of necessity, this section would be better at the end. If it were there, though, it would probably be read by far fewer people, including those that are actually interested in these types of sections. Those in a hurry may skip past and be none the worst for it in dealing with the hands-on bits of performing a migration from .NET Accelerator to WSRP Producer. For others who want to talk about why they did what they did after they did it, or just want to know for themselves, enjoy. A Brief (and edited) History of the WSRP for .NET Technologies (as Relevant to the this Post) Note: This section is for those who are curious about why the migration path is not as simple as many other Oracle technologies. You can skip this section in its entirety and still be just as competent in performing a migration as if you had read it. The currently deployed architecture that was to be migrated and upgraded achieved initial integration between .NET and J2EE over the WSRP protocol through the use of The AquaLogic Interaction .NET Application Accelerator. The .NET Accelerator allowed the applications that were written in ASP.NET and deployed on a Microsoft Internet Information Server (IIS) to interact with a WebLogic Portal application deployed on a WebLogic (J2EE application) Server (both version 9.2, the state of the art at the time of its creation). At the time this architectural decision for the application was made, both the AquaLogic and WebLogic brands were owned by BEA Systems. The AquaLogic brand included products acquired by BEA through the acquisition of Plumtree, whose flagship product was a portal platform available in both J2EE and .NET versions. As part of this dual technology support an adaptor was created to facilitate the use of WSRP as a communication protocol where customers wished to integrate components from both versions of the Plumtree portal. The adapter evolved over several product generations to include a broad array of both standard and proprietary WSRP integration capabilities. Later, BEA Systems was acquired by Oracle. Over the course of several years Oracle has acquired a large number of portal applications and has taken the strategic direction to migrate users of these myriad (and formerly competitive) products to the Oracle WebCenter technology stack. As part of Oracle’s strategic technology roadmap, older portal products are being schedule for end of life, including the portal products that were part of the BEA acquisition. The .NET Accelerator has been modified over a very long period of time with features driven by users of that product and developed under three different vendors (each a direct competitor in the same solution space prior to merger). The Oracle WebCenter WSRP Producer for .NET was introduced much more recently with the key objective to specifically address the needs of the WebCenter customers developing solutions accessible through both J2EE and .NET platforms utilizing the WSRP specifications. The Oracle Product Development Team also provides these insights on the drivers for developing the WSRP Producer: ***************************************** Support for ASP.NET AJAX. Controls using the ASP.NET AJAX script manager do not function properly in the Application Accelerator for .NET. Support 2 way SSL in WLP. This was not possible with the proxy/bridge set up in the existing Application Accelerator for .NET. Allow developers to code portlets (Web Parts) using the .NET framework rather than a proprietary framework. Developers had to use the Application Accelerator for .NET plug-ins to Visual Studio to manage preferences and profile data. This is now replaced with the .NET Framework Personalization (for preferences) and Profile providers. The WSRP Producer for .NET was created as a new way of developing .NET portlets. It was never designed to be an upgrade path for the Application Accelerator for .NET. .NET developers would create new .NET portlets with the WSRP Producer for .NET and leave any existing .NET portlets running in the Application Accelerator for .NET. ***************************************** The advantage to creating a new solution for WSRP is a product that is far easier for Oracle to maintain and support which in turn improves quality, reliability and maintainability for their customers. No changes to J2EE applications consuming the WSRP portlets previously rendered by the.NET Accelerator is required to migrate from the Aqualogic WSRP solution. For some customers using the .NET Accelerator the challenge is adapting their current .NET applications to work with the WSRP Producer (or any other WSRP adapter as they are proprietary by nature). Part of this adaptation is the need to deploy the .NET applications as a child to the WSRP producer web application as root. Differences between .NET Accelerator and WSRP Producer Note: This section is for those who are curious about why the migration is not as pluggable as something such as changing security providers in WebLogic Server. You can skip this section in its entirety and still be just as competent in performing a migration as if you had read it. The basic terminology used to describe the participating applications in a WSRP environment are the same when applied to either the .NET Accelerator or the WSRP Producer: Producer and Consumer. In both cases the .NET application serves as what is referred to as a WSRP environment as the Producer. The difference lies in how the two adapters create the WSRP translation of the .NET application. The .NET Accelerator, as the name implies, is meant to serve as a quick way of adding WSRP capability to a .NET application. As such, at a high level, the .NET Accelerator behaves as a proxy for requests between the .NET application and the WSRP Consumer. A WSRP request is sent from the consumer to the .NET Accelerator, the.NET Accelerator transforms this request into an ASP.NET request, receives the response, then transforms the response into a WSRP response. The .NET Accelerator is deployed as a stand-alone application on IIS. The WSRP Producer is deployed as a parent application on IIS and all ASP.NET modules that will be made available over WSRP are deployed as children of the WSRP Producer application. In this manner, the WSRP Producer acts more as a Request Filter than a proxy in the WSRP transactions between Producer and Consumer. Highly Recommended Enabling Logging Note: You can skip this section now, but you will most likely want to come back to it later, so why not just read it now? Logging is very helpful in tracking down the causes of any anomalies during testing of migrated portlets. To enable the WSRP Producer logging, update the Application_Start method in the Global.asax.cs for your .NET application by adding log4net.Config.XmlConfigurator.Configure(); IIS logs will usually (in a standard configuration) be in a sub folder under C:\WINDOWS\system32\LogFiles\W3SVC. WSRP Producer logs will be found at C:\Oracle\Middleware\WSRPProducerForDotNet\wsrpdefault\Logs\WSRPProducer.log InputTrace.webinfo and OutputTrace.webinfo are located under C:\Oracle\Middleware\WSRPProducerForDotNet\wsrpdefault and can be useful in debugging issues related to markup transformations. Things You Must Do Merge Web.Config Note: If you have been skipping all the sections that you can, now is the time to stop and pay attention J Because the existing .NET application will become a sub-application to the WSRP Producer, you will want to merge required settings from the existing Web.Config to the one in the WSRP Producer. Use the WSRP Producer Master Page The Master Page installed for the WSRP Producer provides common, hiddenform fields and JavaScripts to facilitate portlet instance management and display configuration when the child page is being rendered over WSRP. You add the Master Page by including it in the <@ Page declaration with MasterPageFile="~/portlets/Resources/MasterPages/WSRP.Master" . You then replace: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" > <HTML> <HEAD> With <asp:Content ID="ContentHead1" ContentPlaceHolderID="wsrphead" Runat="Server"> And </HEAD> <body> <form id="theForm" method="post" runat="server"> With </asp:Content> <asp:Content ID="ContentBody1" ContentPlaceHolderID="Main" Runat="Server"> And finally </form> </body> </HTML> With </asp:Content> In the event you already use Master Pages, adapt your existing Master Pages to be sub masters. See Nested ASP.NET Master Pages for a detailed reference of how to do this. It Happened to Me, It Might Happen to You…Or Not Watch for Use of Session or Request in OnInit In the event the .NET application being modified has pages developed to assume the user has been authenticated in an earlier page request there may be direct or indirect references in the OnInit method to request or session objects that may not have been created yet. This will vary from application to application, so the recommended approach is to test first. If there is an issue with a page running as a WSRP portlet then check for potential references in the OnInit method (including references by methods called within OnInit) to session or request objects. If there are, the simplest solution is to create a new method and then call that method once the necessary object(s) is fully available. I find doing this at the start of the Page_Load method to be the simplest solution. Case Sensitivity .NET languages are not case sensitive, but Java is. This means it is possible to have many variations of SRC= and src= or .JPG and .jpg. The preferred solution is to make these mark up instances all lower case in your .NET application. This will allow the default Rewriter rules in wsrp-producer.xml to work as is. If this is not practical, then make duplicates of any rules where an issue is occurring due to upper or mixed case usage in the .NET application markup and match the case in use with the duplicate rule. For example: <RewriterRule> <LookFor>(href=\"([^\"]+)</LookFor> <ChangeToAbsolute>true</ChangeToAbsolute> <ApplyTo>.axd,.css</ApplyTo> <MakeResource>true</MakeResource> </RewriterRule> May need to be duplicated as: <RewriterRule> <LookFor>(HREF=\"([^\"]+)</LookFor> <ChangeToAbsolute>true</ChangeToAbsolute> <ApplyTo>.axd,.css</ApplyTo> <MakeResource>true</MakeResource> </RewriterRule> While it is possible to write a regular expression that will handle mixed case usage, it would be long and strenous to test and maintain, so the recommendation is to use duplicate rules. Is it Still Relative? Some .NET applications base relative paths with a fixed root location. With the introduction of the WSRP Producer, the root has moved up one level. References to ~/ will need to be updated to ~/portlets and many ../ paths will need another ../ in front. I Can See You But I Can’t Find You This issue was first discovered while debugging modules with code that referenced the form on a page from the code-behind by name and/or id. The initial error presented itself as run-time error that was difficult to interpret over WSRP but seemed clear when run as straight ASP.NET as it indicated that the object with the form name did not exist. Since the form name was no longer valid after implementing the WSRP Master Page, the likely fix seemed to simply update the references in the code. However, as the WSRP Master Page is external to the code, a compile time error resulted: Error 155 The name 'form1' does not exist in the current context C:\Oracle\Middleware\WSRPProducerForDotNet\wsrpdefault\portlets\legacywebsite\module\Screens \Reporting.aspx.cs 51 52 legacywebsite.module Much hair-pulling research later it was discovered that it was the use of the FindControl method causing the issue. FindControl doesn’t work quite as expected once a Master Page has been introduced as the controls become embedded in controls, require a recursion to find them that is not part of the FindControl method. In code where the page form is referenced by name, there are two steps to the solution. First, the form needs to be referenced in code generically with Page.Form. For example, this: ToggleControl ctrl = new ToggleControl(frmManualEntry, FunctionLibrary.ParseArrayLst(userObj.Roles)); Becomes this: ToggleControl ctrl = new ToggleControl(Page.Form, FunctionLibrary.ParseArrayLst(userObj.Roles)); Generally the form id is referenced in most ASP.NET applications as a path to a control on the form. To reach the control once a MasterPage has been added requires an additional method to recurse through the controls collections within the form and find the control ID. The following method (found at Rick Strahl's Web Log) corrects this very nicely: public static Control FindControlRecursive(Control Root, string Id) { if (Root.ID == Id) return Root; foreach (Control Ctl in Root.Controls) { Control FoundCtl = FindControlRecursive(Ctl, Id); if (FoundCtl != null) return FoundCtl; } return null; } Where the form name is not referenced, simply using the FindControlRecursive method in place of FindControl will be all that is necessary. Following the second part of the example referenced earlier, the method called with Page.Form changes its value extraction code block from this: Label lblErrMsg = (Label)frmRef.FindControl("lblBRMsg" To this: Label lblErrMsg = (Label) FunctionLibrary.FindControlRecursive(frmRef, "lblBRMsg" The Master That Won’t Step Aside In most migrations it is preferable to make as few changes as possible. In one case I ran across an existing Master Page that would not function as a sub-Master Page. While it would probably have been educational to trace down why, the expedient process of updating it to take the place of the WSRP Master Page is the route I took. The changes are highlighted below: … <asp:ContentPlaceHolder ID="wsrphead" runat="server"></asp:ContentPlaceHolder> </head> <body leftMargin="0" topMargin="0"> <form id="TheForm" runat="server"> <input type="hidden" name="key" id="key" value="" /> <input type="hidden" name="formactionurl" id="formactionurl" value="" /> <input type="hidden" name="handle" id="handle" value="" /> <asp:ScriptManager ID="ScriptManager1" runat="server" EnablePartialRendering="true" > </asp:ScriptManager> This approach did not work for all existing Master Pages, but fortunately all of the other existing Master Pages I have run across worked fine as a sub-Master to the WSRP Master Page. Moving On In Enterprise Portals, even after you get everything working, the work is not finished. Next you need to get it where everyone will work with it. Migration Planning Providing that the server where IIS is running is adequately sized, it is possible to run both the .NET Accelerator and the WSRP Producer on the same server during the upgrade process. The upgrade can be performed incrementally, i.e., one portlet at a time, if server administration processes support it. Those processes would include the ability to manage a second producer in the consuming portal and to change over individual portlet instances from one provider to the other. If processes or requirements demand that all portlets be cut over at the same time, it needs to be determined if this cut over should include a new producer, updating all of the portlets in the consumer, or if the WSRP Producer portlet configuration must maintain the naming conventions used by the .NET Accelerator and simply change the WSRP end point configured in the consumer. In some enterprises it may even be necessary to maintain the same WSDL end point, at which point the IIS configuration will be where the updates occur. The downside to such a requirement is that it makes rolling back very difficult, should the need arise. Location, Location, Location Not everyone wants the web application to have the descriptively obvious wsrpdefault location, or needs to create a second WSRP site on the same server. The instructions below are from the product team and, while targeted towards making a second site, will work for creating a site with a different name and then remove the old site. You can also change just the name in IIS. Manually Creating a WSRP Producer Site Instructions (NOTE: all executables used are the same ones used by the installer and “wsrpdev” will be the name of the new instance): 1. Copy C:\Oracle\Middleware\WSRPProducerForDotNet\wsrpdefault to C:\Oracle\Middleware\WSRPProducerForDotNet\wsrpdev. 2. Bring up a command window as an administrator 3. Run C:\Oracle\Middleware\WSRPProducerForDotNet\uninstall_resources\IISAppAccelSiteCreator.exe install WSRPProducers wsrpdev "C:\Oracle\Middleware\WSRPProducerForDotNet\wsrpdev" 8678 2.0.50727 4. Run C:\Oracle\Middleware\WSRPProducerForDotNet\uninstall_resources\PermManage.exe add FileSystem C:\Oracle\Middleware\WSRPProducerForDotNet\wsrpdev "NETWORK SERVICE" 3 1 5. Run C:\Oracle\Middleware\WSRPProducerForDotNet\uninstall_resources\PermManage.exe add FileSystem C:\Oracle\Middleware\WSRPProducerForDotNet\wsrpdev EVERYONE 1 1 6. Open up C:\Oracle\Middleware\WSRPProducerForDotNet\wsdl\1.0\WSRPService.wsdl and replace wsrpdefault with wsrpdev 7. Open up C:\Oracle\Middleware\WSRPProducerForDotNet\wsdl\2.0\WSRPService.wsdl and replace wsrpdefault with wsrpdev Tests: 1. Bring up a browser on the host itself and go to http://localhost:8678/wsrpdev/wsdl/1.0/WSRPService.wsdl and make sure that the URLs in the XML returned include the wsrpdev changes you made in step 6. 2. Bring up a browser on the host itself and see if the default sample comes up: http://localhost:8678/wsrpdev/portlets/ASPNET_AJAX_sample/default.aspx 3. Register the producer in WLP and test the portlet. Changing the Port used by WSRP Producer The pre-configured port for the WSRP Producer is 8678. You can change this port by updating both the IIS configuration and C:\Oracle\Middleware\WSRPProducerForDotNet\[WSRP_APP_NAME]\wsdl\1.0\WSRPService.wsdl. Do You Need to Migrate? Oracle Premier Support ended in November of 2010 for AquaLogic Interaction .NET Application Accelerator 1.x and Extended Support ends in November 2012 (see http://www.oracle.com/us/support/lifetime-support/lifetime-support-software-342730.html for other related dates). This means that integration with products released after November of 2010 is not supported. If having such support is the policy within your enterprise, you do indeed need to migrate. If changes in your enterprise cause your current solution with the .NET Accelerator to no longer function properly, you may need to migrate. Migration is a choice, and if the goals of your enterprise are to take full advantage of newer technologies then migration is certainly one activity you should be planning for.

Read the article
ASP.NET Frameworks and Raw Throughput Performance

- by Rick Strahl

A few days ago I had a curious thought: With all these different technologies that the ASP.NET stack has to offer, what's the most efficient technology overall to return data for a server request? When I started this it was mere curiosity rather than a real practical need or result. Different tools are used for different problems and so performance differences are to be expected. But still I was curious to see how the various technologies performed relative to each just for raw throughput of the request getting to the endpoint and back out to the client with as little processing in the actual endpoint logic as possible (aka Hello World!). I want to clarify that this is merely an informal test for my own curiosity and I'm sharing the results and process here because I thought it was interesting. It's been a long while since I've done any sort of perf testing on ASP.NET, mainly because I've not had extremely heavy load requirements and because overall ASP.NET performs very well even for fairly high loads so that often it's not that critical to test load performance. This post is not meant to make a point or even come to a conclusion which tech is better, but just to act as a reference to help understand some of the differences in perf and give a starting point to play around with this yourself. I've included the code for this simple project, so you can play with it and maybe add a few additional tests for different things if you like. Source Code on GitHub I looked at this data for these technologies: ASP.NET Web API ASP.NET MVC WebForms ASP.NET WebPages ASMX AJAX Services (couldn't get AJAX/JSON to run on IIS8 ) WCF Rest Raw ASP.NET HttpHandlers It's quite a mixed bag, of course and the technologies target different types of development. What started out as mere curiosity turned into a bit of a head scratcher as the results were sometimes surprising. What I describe here is more to satisfy my curiosity more than anything and I thought it interesting enough to discuss on the blog :-) First test: Raw Throughput The first thing I did is test raw throughput for the various technologies. This is the least practical test of course since you're unlikely to ever create the equivalent of a 'Hello World' request in a real life application. The idea here is to measure how much time a 'NOP' request takes to return data to the client. So for this request I create the simplest Hello World request that I could come up for each tech. Http Handler The first is the lowest level approach which is an HTTP handler. public class Handler : IHttpHandler { public void ProcessRequest(HttpContext context) { context.Response.ContentType = "text/plain"; context.Response.Write("Hello World. Time is: " + DateTime.Now.ToString()); } public bool IsReusable { get { return true; } } } WebForms Next I added a couple of ASPX pages - one using CodeBehind and one using only a markup page. The CodeBehind page simple does this in CodeBehind without any markup in the ASPX page: public partial class HelloWorld_CodeBehind : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { Response.Write("Hello World. Time is: " + DateTime.Now.ToString() ); Response.End(); } } while the Markup page only contains some static output via an expression:<%@ Page Language="C#" AutoEventWireup="false" CodeBehind="HelloWorld_Markup.aspx.cs" Inherits="AspNetFrameworksPerformance.HelloWorld_Markup" %> Hello World. Time is <%= DateTime.Now %> ASP.NET WebPages WebPages is the freestanding Razor implementation of ASP.NET. Here's the simple HelloWorld.cshtml page:Hello World @DateTime.Now WCF REST WCF REST was the token REST implementation for ASP.NET before WebAPI and the inbetween step from ASP.NET AJAX. I'd like to forget that this technology was ever considered for production use, but I'll include it here. Here's an OperationContract class: [ServiceContract(Namespace = "")] [AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)] public class WcfService { [OperationContract] [WebGet] public Stream HelloWorld() { var data = Encoding.Unicode.GetBytes("Hello World" + DateTime.Now.ToString()); var ms = new MemoryStream(data); // Add your operation implementation here return ms; } } WCF REST can return arbitrary results by returning a Stream object and a content type. The code above turns the string result into a stream and returns that back to the client. ASP.NET AJAX (ASMX Services) I also wanted to test ASP.NET AJAX services because prior to WebAPI this is probably still the most widely used AJAX technology for the ASP.NET stack today. Unfortunately I was completely unable to get this running on my Windows 8 machine. Visual Studio 2012 removed adding of ASP.NET AJAX services, and when I tried to manually add the service and configure the script handler references it simply did not work - I always got a SOAP response for GET and POST operations. No matter what I tried I always ended up getting XML results even when explicitly adding the ScriptHandler. So, I didn't test this (but the code is there - you might be able to test this on a Windows 7 box). ASP.NET MVC Next up is probably the most popular ASP.NET technology at the moment: MVC. Here's the small controller: public class MvcPerformanceController : Controller { public ActionResult Index() { return View(); } public ActionResult HelloWorldCode() { return new ContentResult() { Content = "Hello World. Time is: " + DateTime.Now.ToString() }; } } ASP.NET WebAPI Next up is WebAPI which looks kind of similar to MVC. Except here I have to use a StringContent result to return the response: public class WebApiPerformanceController : ApiController { [HttpGet] public HttpResponseMessage HelloWorldCode() { return new HttpResponseMessage() { Content = new StringContent("Hello World. Time is: " + DateTime.Now.ToString(), Encoding.UTF8, "text/plain") }; } } Testing Take a minute to think about each of the technologies… and take a guess which you think is most efficient in raw throughput. The fastest should be pretty obvious, but the others - maybe not so much. The testing I did is pretty informal since it was mainly to satisfy my curiosity - here's how I did this: I used Apache Bench (ab.exe) from a full Apache HTTP installation to run and log the test results of hitting the server. ab.exe is a small executable that lets you hit a URL repeatedly and provides counter information about the number of requests, requests per second etc. ab.exe and the batch file are located in the \LoadTests folder of the project. An ab.exe command line looks like this: ab.exe -n100000 -c20 http://localhost/aspnetperf/api/HelloWorld which hits the specified URL 100,000 times with a load factor of 20 concurrent requests. This results in output like this: It's a great way to get a quick and dirty performance summary. Run it a few times to make sure there's not a large amount of varience. You might also want to do an IISRESET to clear the Web Server. Just make sure you do a short test run to warm up the server first - otherwise your first run is likely to be skewed downwards. ab.exe also allows you to specify headers and provide POST data and many other things if you want to get a little more fancy. Here all tests are GET requests to keep it simple. I ran each test: 100,000 iterations Load factor of 20 concurrent connections IISReset before starting A short warm up run for API and MVC to make sure startup cost is mitigated Here is the batch file I used for the test: IISRESET REM make sure you add REM C:\Program Files (x86)\Apache Software Foundation\Apache2.2\bin REM to your path so ab.exe can be found REM Warm up ab.exe -n100 -c20 http://localhost/aspnetperf/MvcPerformance/HelloWorldJsonab.exe -n100 -c20 http://localhost/aspnetperf/api/HelloWorldJson ab.exe -n100 -c20 http://localhost/AspNetPerf/WcfService.svc/HelloWorld ab.exe -n100000 -c20 http://localhost/aspnetperf/handler.ashx > handler.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/HelloWorld_CodeBehind.aspx > AspxCodeBehind.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/HelloWorld_Markup.aspx > AspxMarkup.txt ab.exe -n100000 -c20 http://localhost/AspNetPerf/WcfService.svc/HelloWorld > Wcf.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/MvcPerformance/HelloWorldCode > Mvc.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/api/HelloWorld > WebApi.txt I ran each of these tests 3 times and took the average score for Requests/second, with the machine otherwise idle. I did see a bit of variance when running many tests but the values used here are the medians. Part of this has to do with the fact I ran the tests on my local machine - result would probably more consistent running the load test on a separate machine hitting across the network. I ran these tests locally on my laptop which is a Dell XPS with quad core Sandibridge I7-2720QM @ 2.20ghz and a fast SSD drive on Windows 8. CPU load during tests ran to about 70% max across all 4 cores (IOW, it wasn't overloading the machine). Ideally you can try running these tests on a separate machine hitting the local machine. If I remember correctly IIS 7 and 8 on client OSs don't throttle so the performance here should be Results Ok, let's cut straight to the chase. Below are the results from the tests… It's not surprising that the handler was fastest. But it was a bit surprising to me that the next fastest was WebForms and especially Web Forms with markup over a CodeBehind page. WebPages also fared fairly well. MVC and WebAPI are a little slower and the slowest by far is WCF REST (which again I find surprising). As mentioned at the start the raw throughput tests are not overly practical as they don't test scripting performance for the HTML generation engines or serialization performances of the data engines. All it really does is give you an idea of the raw throughput for the technology from time of request to reaching the endpoint and returning minimal text data back to the client which indicates full round trip performance. But it's still interesting to see that Web Forms performs better in throughput than either MVC, WebAPI or WebPages. It'd be interesting to try this with a few pages that actually have some parsing logic on it, but that's beyond the scope of this throughput test. But what's also amazing about this test is the sheer amount of traffic that a laptop computer is handling. Even the slowest tech managed 5700 requests a second, which is one hell of a lot of requests if you extrapolate that out over a 24 hour period. Remember these are not static pages, but dynamic requests that are being served. Another test - JSON Data Service Results The second test I used a JSON result from several of the technologies. I didn't bother running WebForms and WebPages through this test since that doesn't make a ton of sense to return data from the them (OTOH, returning text from the APIs didn't make a ton of sense either :-) In these tests I have a small Person class that gets serialized and then returned to the client. The Person class looks like this: public class Person { public Person() { Id = 10; Name = "Rick"; Entered = DateTime.Now; } public int Id { get; set; } public string Name { get; set; } public DateTime Entered { get; set; } } Here are the updated handler classes that use Person: Handler public class Handler : IHttpHandler { public void ProcessRequest(HttpContext context) { var action = context.Request.QueryString["action"]; if (action == "json") JsonRequest(context); else TextRequest(context); } public void TextRequest(HttpContext context) { context.Response.ContentType = "text/plain"; context.Response.Write("Hello World. Time is: " + DateTime.Now.ToString()); } public void JsonRequest(HttpContext context) { var json = JsonConvert.SerializeObject(new Person(), Formatting.None); context.Response.ContentType = "application/json"; context.Response.Write(json); } public bool IsReusable { get { return true; } } } This code adds a little logic to check for a action query string and route the request to an optional JSON result method. To generate JSON, I'm using the same JSON.NET serializer (JsonConvert.SerializeObject) used in Web API to create the JSON response. WCF REST [ServiceContract(Namespace = "")] [AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)] public class WcfService { [OperationContract] [WebGet] public Stream HelloWorld() { var data = Encoding.Unicode.GetBytes("Hello World " + DateTime.Now.ToString()); var ms = new MemoryStream(data); // Add your operation implementation here return ms; } [OperationContract] [WebGet(ResponseFormat=WebMessageFormat.Json,BodyStyle=WebMessageBodyStyle.WrappedRequest)] public Person HelloWorldJson() { // Add your operation implementation here return new Person(); } } For WCF REST all I have to do is add a method with the Person result type. ASP.NET MVC public class MvcPerformanceController : Controller { // // GET: /MvcPerformance/ public ActionResult Index() { return View(); } public ActionResult HelloWorldCode() { return new ContentResult() { Content = "Hello World. Time is: " + DateTime.Now.ToString() }; } public JsonResult HelloWorldJson() { return Json(new Person(), JsonRequestBehavior.AllowGet); } } For MVC all I have to do for a JSON response is return a JSON result. ASP.NET internally uses JavaScriptSerializer. ASP.NET WebAPI public class WebApiPerformanceController : ApiController { [HttpGet] public HttpResponseMessage HelloWorldCode() { return new HttpResponseMessage() { Content = new StringContent("Hello World. Time is: " + DateTime.Now.ToString(), Encoding.UTF8, "text/plain") }; } [HttpGet] public Person HelloWorldJson() { return new Person(); } [HttpGet] public HttpResponseMessage HelloWorldJson2() { var response = new HttpResponseMessage(HttpStatusCode.OK); response.Content = new ObjectContent<Person>(new Person(), GlobalConfiguration.Configuration.Formatters.JsonFormatter); return response; } } Testing and Results To run these data requests I used the following ab.exe commands:REM JSON RESPONSES ab.exe -n100000 -c20 http://localhost/aspnetperf/Handler.ashx?action=json > HandlerJson.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/MvcPerformance/HelloWorldJson > MvcJson.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/api/HelloWorldJson > WebApiJson.txt ab.exe -n100000 -c20 http://localhost/AspNetPerf/WcfService.svc/HelloWorldJson > WcfJson.txt The results from this test run are a bit interesting in that the WebAPI test improved performance significantly over returning plain string content. Here are the results: The performance for each technology drops a little bit except for WebAPI which is up quite a bit! From this test it appears that WebAPI is actually significantly better performing returning a JSON response, rather than a plain string response. Snag with Apache Benchmark and 'Length Failures' I ran into a little snag with Apache Benchmark, which was reporting failures for my Web API requests when serializing. As the graph shows performance improved significantly from with JSON results from 5580 to 6530 or so which is a 15% improvement (while all others slowed down by 3-8%). However, I was skeptical at first because the WebAPI test reports showed a bunch of errors on about 10% of the requests. Check out this report: Notice the Failed Request count. What the hey? Is WebAPI failing on roughly 10% of requests when sending JSON? Turns out: No it's not! But it took some sleuthing to figure out why it reports these failures. At first I thought that Web API was failing, and so to make sure I re-ran the test with Fiddler attached and runiisning the ab.exe test by using the -X switch: ab.exe -n100 -c10 -X localhost:8888 http://localhost/aspnetperf/api/HelloWorldJson which showed that indeed all requests where returning proper HTTP 200 results with full content. However ab.exe was reporting the errors. After some closer inspection it turned out that the dates varying in size altered the response length in dynamic output. For example: these two results: {"Id":10,"Name":"Rick","Entered":"2012-09-04T10:57:24.841926-10:00"} {"Id":10,"Name":"Rick","Entered":"2012-09-04T10:57:24.8519262-10:00"} are different in length for the number which results in 68 and 69 bytes respectively. The same URL produces different result lengths which is what ab.exe reports. I didn't notice at first bit the same is happening when running the ASHX handler with JSON.NET result since it uses the same serializer that varies the milliseconds. Moral: You can typically ignore Length failures in Apache Benchmark and when in doubt check the actual output with Fiddler. Note that the other failure values are accurate though. Another interesting Side Note: Perf drops over Time As I was running these tests repeatedly I was finding that performance steadily dropped from a startup peak to a 10-15% lower stable level. IOW, with Web API I'd start out with around 6500 req/sec and in subsequent runs it keeps dropping until it would stabalize somewhere around 5900 req/sec occasionally jumping lower. For these tests this is why I did the IIS RESET and warm up for individual tests. This is a little puzzling. Looking at Process Monitor while the test are running memory very quickly levels out as do handles and threads, on the first test run. Subsequent runs everything stays stable, but the performance starts going downwards. This applies to all the technologies - Handlers, Web Forms, MVC, Web API - curious to see if others test this and see similar results. Doing an IISRESET then resets everything and performance starts off at peak again… Summary As I stated at the outset, these were informal to satiate my curiosity not to prove that any technology is better or even faster than another. While there clearly are differences in performance the differences (other than WCF REST which was by far the slowest and the raw handler which was by far the highest) are relatively minor, so there is no need to feel that any one technology is a runaway standout in raw performance. Choosing a technology is about more than pure performance but also about the adequateness for the job and the easy of implementation. The strengths of each technology will make for any minor performance difference we see in these tests. However, to me it's important to get an occasional reality check and compare where new technologies are heading. Often times old stuff that's been optimized and designed for a time of less horse power can utterly blow the doors off newer tech and simple checks like this let you compare. Luckily we're seeing that much of the new stuff performs well even in V1.0 which is great. To me it was very interesting to see Web API perform relatively badly with plain string content, which originally led me to think that Web API might not be properly optimized just yet. For those that caught my Tweets late last week regarding WebAPI's slow responses was with String content which is in fact considerably slower. Luckily where it counts with serialized JSON and XML WebAPI actually performs better. But I do wonder what would make generic string content slower than serialized code? This stresses another point: Don't take a single test as the final gospel and don't extrapolate out from a single set of tests. Certainly Twitter can make you feel like a fool when you post something immediate that hasn't been fleshed out a little more <blush>. Egg on my face. As a result I ended up screwing around with this for a few hours today to compare different scenarios. Well worth the time… I hope you found this useful, if not for the results, maybe for the process of quickly testing a few requests for performance and charting out a comparison. Now onwards with more serious stuff… Resources Source Code on GitHub Apache HTTP Server Project (ab.exe is part of the binary distribution)© Rick Strahl, West Wind Technologies, 2005-2012Posted in ASP.NET Web Api Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article
Syncing Data with a Server using Silverlight and HTTP Polling Duplex

- by dwahlin

Many applications have the need to stay in-sync with data provided by a service. Although web applications typically rely on standard polling techniques to check if data has changed, Silverlight provides several interesting options for keeping an application in-sync that rely on server “push” technologies. A few years back I wrote several blog posts covering different “push” technologies available in Silverlight that rely on sockets or HTTP Polling Duplex. We recently had a project that looked like it could benefit from pushing data from a server to one or more clients so I thought I’d revisit the subject and provide some updates to the original code posted. If you’ve worked with AJAX before in Web applications then you know that until browsers fully support web sockets or other duplex (bi-directional communication) technologies that it’s difficult to keep applications in-sync with a server without relying on polling. The problem with polling is that you have to check for changes on the server on a timed-basis which can often be wasteful and take up unnecessary resources. With server “push” technologies, data can be pushed from the server to the client as it changes. Once the data is received, the client can update the user interface as appropriate. Using “push” technologies allows the client to listen for changes from the data but stay 100% focused on client activities as opposed to worrying about polling and asking the server if anything has changed. Silverlight provides several options for pushing data from a server to a client including sockets, TCP bindings and HTTP Polling Duplex. Each has its own strengths and weaknesses as far as performance and setup work with HTTP Polling Duplex arguably being the easiest to setup and get going. In this article I’ll demonstrate how HTTP Polling Duplex can be used in Silverlight 4 applications to push data and show how you can create a WCF server that provides an HTTP Polling Duplex binding that a Silverlight client can consume. What is HTTP Polling Duplex? Technologies that allow data to be pushed from a server to a client rely on duplex functionality. Duplex (or bi-directional) communication allows data to be passed in both directions. A client can call a service and the server can call the client. HTTP Polling Duplex (as its name implies) allows a server to communicate with a client without forcing the client to constantly poll the server. It has the benefit of being able to run on port 80 making setup a breeze compared to the other options which require specific ports to be used and cross-domain policy files to be exposed on port 943 (as with sockets and TCP bindings). Having said that, if you’re looking for the best speed possible then sockets and TCP bindings are the way to go. But, they’re not the only game in town when it comes to duplex communication. The first time I heard about HTTP Polling Duplex (initially available in Silverlight 2) I wasn’t exactly sure how it was any better than standard polling used in AJAX applications. I read the Silverlight SDK, looked at various resources and generally found the following definition unhelpful as far as understanding the actual benefits that HTTP Polling Duplex provided: "The Silverlight client periodically polls the service on the network layer, and checks for any new messages that the service wants to send on the callback channel. The service queues all messages sent on the client callback channel and delivers them to the client when the client polls the service." Although the previous definition explained the overall process, it sounded as if standard polling was used. Fortunately, Microsoft’s Scott Guthrie provided me with a more clear definition several years back that explains the benefits provided by HTTP Polling Duplex quite well (used with his permission): "The [HTTP Polling Duplex] duplex support does use polling in the background to implement notifications – although the way it does it is different than manual polling. It initiates a network request, and then the request is effectively “put to sleep” waiting for the server to respond (it doesn’t come back immediately). The server then keeps the connection open but not active until it has something to send back (or the connection times out after 90 seconds – at which point the duplex client will connect again and wait). This way you are avoiding hitting the server repeatedly – but still get an immediate response when there is data to send." After hearing Scott’s definition the light bulb went on and it all made sense. A client makes a request to a server to check for changes, but instead of the request returning immediately, it parks itself on the server and waits for data. It’s kind of like waiting to pick up a pizza at the store. Instead of calling the store over and over to check the status, you sit in the store and wait until the pizza (the request data) is ready. Once it’s ready you take it back home (to the client). This technique provides a lot of efficiency gains over standard polling techniques even though it does use some polling of its own as a request is initially made from a client to a server. So how do you implement HTTP Polling Duplex in your Silverlight applications? Let’s take a look at the process by starting with the server. Creating an HTTP Polling Duplex WCF Service Creating a WCF service that exposes an HTTP Polling Duplex binding is straightforward as far as coding goes. Add some one way operations into an interface, create a client callback interface and you’re ready to go. The most challenging part comes into play when configuring the service to properly support the necessary binding and that’s more of a cut and paste operation once you know the configuration code to use. To create an HTTP Polling Duplex service you’ll need to expose server-side and client-side interfaces and reference the System.ServiceModel.PollingDuplex assembly (located at C:\Program Files (x86)\Microsoft SDKs\Silverlight\v4.0\Libraries\Server on my machine) in the server project. For the demo application I upgraded a basketball simulation service to support the latest polling duplex assemblies. The service simulates a simple basketball game using a Game class and pushes information about the game such as score, fouls, shots and more to the client as the game changes over time. Before jumping too far into the game push service, it’s important to discuss two interfaces used by the service to communicate in a bi-directional manner. The first is called IGameStreamService and defines the methods/operations that the client can call on the server (see Listing 1). The second is IGameStreamClient which defines the callback methods that a server can use to communicate with a client (see Listing 2). [ServiceContract(Namespace = "Silverlight", CallbackContract = typeof(IGameStreamClient))] public interface IGameStreamService { [OperationContract(IsOneWay = true)] void GetTeamData(); } Listing 1. The IGameStreamService interface defines server operations that can be called on the server. [ServiceContract] public interface IGameStreamClient { [OperationContract(IsOneWay = true)] void ReceiveTeamData(List<Team> teamData); [OperationContract(IsOneWay = true, AsyncPattern=true)] IAsyncResult BeginReceiveGameData(GameData gameData, AsyncCallback callback, object state); void EndReceiveGameData(IAsyncResult result); } Listing 2. The IGameStreamClient interfaces defines client operations that a server can call. The IGameStreamService interface is decorated with the standard ServiceContract attribute but also contains a value for the CallbackContract property. This property is used to define the interface that the client will expose (IGameStreamClient in this example) and use to receive data pushed from the service. Notice that each OperationContract attribute in both interfaces sets the IsOneWay property to true. This means that the operation can be called and passed data as appropriate, however, no data will be passed back. Instead, data will be pushed back to the client as it’s available. Looking through the IGameStreamService interface you can see that the client can request team data whereas the IGameStreamClient interface allows team and game data to be received by the client. One interesting point about the IGameStreamClient interface is the inclusion of the AsyncPattern property on the BeginReceiveGameData operation. I initially created this operation as a standard one way operation and it worked most of the time. However, as I disconnected clients and reconnected new ones game data wasn’t being passed properly. After researching the problem more I realized that because the service could take up to 7 seconds to return game data, things were getting hung up. By setting the AsyncPattern property to true on the BeginReceivedGameData operation and providing a corresponding EndReceiveGameData operation I was able to get around this problem and get everything running properly. I’ll provide more details on the implementation of these two methods later in this post. Once the interfaces were created I moved on to the game service class. The first order of business was to create a class that implemented the IGameStreamService interface. Since the service can be used by multiple clients wanting game data I added the ServiceBehavior attribute to the class definition so that I could set its InstanceContextMode to InstanceContextMode.Single (in effect creating a Singleton service object). Listing 3 shows the game service class as well as its fields and constructor. [ServiceBehavior(ConcurrencyMode = ConcurrencyMode.Multiple, InstanceContextMode = InstanceContextMode.Single)] public class GameStreamService : IGameStreamService { object _Key = new object(); Game _Game = null; Timer _Timer = null; Random _Random = null; Dictionary<string, IGameStreamClient> _ClientCallbacks = new Dictionary<string, IGameStreamClient>(); static AsyncCallback _ReceiveGameDataCompleted = new AsyncCallback(ReceiveGameDataCompleted); public GameStreamService() { _Game = new Game(); _Timer = new Timer { Enabled = false, Interval = 2000, AutoReset = true }; _Timer.Elapsed += new ElapsedEventHandler(_Timer_Elapsed); _Timer.Start(); _Random = new Random(); }} Listing 3. The GameStreamService implements the IGameStreamService interface which defines a callback contract that allows the service class to push data back to the client. By implementing the IGameStreamService interface, GameStreamService must supply a GetTeamData() method which is responsible for supplying information about the teams that are playing as well as individual players. GetTeamData() also acts as a client subscription method that tracks clients wanting to receive game data. Listing 4 shows the GetTeamData() method. public void GetTeamData() { //Get client callback channel var context = OperationContext.Current; var sessionID = context.SessionId; var currClient = context.GetCallbackChannel<IGameStreamClient>(); context.Channel.Faulted += Disconnect; context.Channel.Closed += Disconnect; IGameStreamClient client; if (!_ClientCallbacks.TryGetValue(sessionID, out client)) { lock (_Key) { _ClientCallbacks[sessionID] = currClient; } } currClient.ReceiveTeamData(_Game.GetTeamData()); //Start timer which when fired sends updated score information to client if (!_Timer.Enabled) { _Timer.Enabled = true; } } Listing 4. The GetTeamData() method subscribes a given client to the game service and returns. The key the line of code in the GetTeamData() method is the call to GetCallbackChannel<IGameStreamClient>(). This method is responsible for accessing the calling client’s callback channel. The callback channel is defined by the IGameStreamClient interface shown earlier in Listing 2 and used by the server to communicate with the client. Before passing team data back to the client, GetTeamData() grabs the client’s session ID and checks if it already exists in the _ClientCallbacks dictionary object used to track clients wanting callbacks from the server. If the client doesn’t exist it adds it into the collection. It then pushes team data from the Game class back to the client by calling ReceiveTeamData(). Since the service simulates a basketball game, a timer is then started if it’s not already enabled which is then used to randomly send data to the client. When the timer fires, game data is pushed down to the client. Listing 5 shows the _Timer_Elapsed() method that is called when the timer fires as well as the SendGameData() method used to send data to the client. void _Timer_Elapsed(object sender, ElapsedEventArgs e) { int interval = _Random.Next(3000, 7000); lock (_Key) { _Timer.Interval = interval; _Timer.Enabled = false; } SendGameData(_Game.GetGameData()); } private void SendGameData(GameData gameData) { var cbs = _ClientCallbacks.Where(cb => ((IContextChannel)cb.Value).State == CommunicationState.Opened); for (int i = 0; i < cbs.Count(); i++) { var cb = cbs.ElementAt(i).Value; try { cb.BeginReceiveGameData(gameData, _ReceiveGameDataCompleted, cb); } catch (TimeoutException texp) { //Log timeout error } catch (CommunicationException cexp) { //Log communication error } } lock (_Key) _Timer.Enabled = true; } private static void ReceiveGameDataCompleted(IAsyncResult result) { try { ((IGameStreamClient)(result.AsyncState)).EndReceiveGameData(result); } catch (CommunicationException) { // empty } catch (TimeoutException) { // empty } } LIsting 5. _Timer_Elapsed is used to simulate time in a basketball game. When _Timer_Elapsed() fires the SendGameData() method is called which iterates through the clients wanting to be notified of changes. As each client is identified, their respective BeginReceiveGameData() method is called which ultimately pushes game data down to the client. Recall that this method was defined in the client callback interface named IGameStreamClient shown earlier in Listing 2. Notice that BeginReceiveGameData() accepts _ReceiveGameDataCompleted as its second parameter (an AsyncCallback delegate defined in the service class) and passes the client callback as the third parameter. The initial version of the sample application had a standard ReceiveGameData() method in the client callback interface. However, sometimes the client callbacks would work properly and sometimes they wouldn’t which was a little baffling at first glance. After some investigation I realized that I needed to implement an asynchronous pattern for client callbacks to work properly since 3 – 7 second delays are occurring as a result of the timer. Once I added the BeginReceiveGameData() and ReceiveGameDataCompleted() methods everything worked properly since each call was handled in an asynchronous manner. The final task that had to be completed to get the server working properly with HTTP Polling Duplex was adding configuration code into web.config. In the interest of brevity I won’t post all of the code here since the sample application includes everything you need. However, Listing 6 shows the key configuration code to handle creating a custom binding named pollingDuplexBinding and associate it with the service’s endpoint. <bindings> <customBinding> <binding name="pollingDuplexBinding"> <binaryMessageEncoding /> <pollingDuplex maxPendingSessions="2147483647" maxPendingMessagesPerSession="2147483647" inactivityTimeout="02:00:00" serverPollTimeout="00:05:00"/> <httpTransport /> </binding> </customBinding> </bindings> <services> <service name="GameService.GameStreamService" behaviorConfiguration="GameStreamServiceBehavior"> <endpoint address="" binding="customBinding" bindingConfiguration="pollingDuplexBinding" contract="GameService.IGameStreamService"/> <endpoint address="mex" binding="mexHttpBinding" contract="IMetadataExchange" /> </service> </services> Listing 6. Configuring an HTTP Polling Duplex binding in web.config and associating an endpoint with it. Calling the Service and Receiving “Pushed” Data Calling the service and handling data that is pushed from the server is a simple and straightforward process in Silverlight. Since the service is configured with a MEX endpoint and exposes a WSDL file, you can right-click on the Silverlight project and select the standard Add Service Reference item. After the web service proxy is created you may notice that the ServiceReferences.ClientConfig file only contains an empty configuration element instead of the normal configuration elements created when creating a standard WCF proxy. You can certainly update the file if you want to read from it at runtime but for the sample application I fed the service URI directly to the service proxy as shown next: var address = new EndpointAddress("http://localhost.:5661/GameStreamService.svc"); var binding = new PollingDuplexHttpBinding(); _Proxy = new GameStreamServiceClient(binding, address); _Proxy.ReceiveTeamDataReceived += _Proxy_ReceiveTeamDataReceived; _Proxy.ReceiveGameDataReceived += _Proxy_ReceiveGameDataReceived; _Proxy.GetTeamDataAsync(); This code creates the proxy and passes the endpoint address and binding to use to its constructor. It then wires the different receive events to callback methods and calls GetTeamDataAsync(). Calling GetTeamDataAsync() causes the server to store the client in the server-side dictionary collection mentioned earlier so that it can receive data that is pushed. As the server-side timer fires and game data is pushed to the client, the user interface is updated as shown in Listing 7. Listing 8 shows the _Proxy_ReceiveGameDataReceived() method responsible for handling the data and calling UpdateGameData() to process it. Listing 7. The Silverlight interface. Game data is pushed from the server to the client using HTTP Polling Duplex. void _Proxy_ReceiveGameDataReceived(object sender, ReceiveGameDataReceivedEventArgs e) { UpdateGameData(e.gameData); } private void UpdateGameData(GameData gameData) { //Update Score this.tbTeam1Score.Text = gameData.Team1Score.ToString(); this.tbTeam2Score.Text = gameData.Team2Score.ToString(); //Update ball visibility if (gameData.Action != ActionsEnum.Foul) { if (tbTeam1.Text == gameData.TeamOnOffense) { AnimateBall(this.BB1, this.BB2); } else //Team 2 { AnimateBall(this.BB2, this.BB1); } } if (this.lbActions.Items.Count > 9) this.lbActions.Items.Clear(); this.lbActions.Items.Add(gameData.LastAction); if (this.lbActions.Visibility == Visibility.Collapsed) this.lbActions.Visibility = Visibility.Visible; } private void AnimateBall(Image onBall, Image offBall) { this.FadeIn.Stop(); Storyboard.SetTarget(this.FadeInAnimation, onBall); Storyboard.SetTarget(this.FadeOutAnimation, offBall); this.FadeIn.Begin(); } Listing 8. As the server pushes game data, the client’s _Proxy_ReceiveGameDataReceived() method is called to process the data. In a real-life application I’d go with a ViewModel class to handle retrieving team data, setup data bindings and handle data that is pushed from the server. However, for the sample application I wanted to focus on HTTP Polling Duplex and keep things as simple as possible. Summary Silverlight supports three options when duplex communication is required in an application including TCP bindins, sockets and HTTP Polling Duplex. In this post you’ve seen how HTTP Polling Duplex interfaces can be created and implemented on the server as well as how they can be consumed by a Silverlight client. HTTP Polling Duplex provides a nice way to “push” data from a server while still allowing the data to flow over port 80 or another port of your choice. Sample Application Download

Read the article
256 Windows Azure Worker Roles, Windows Kinect and a 90's Text-Based Ray-Tracer

- by Alan Smith

For a couple of years I have been demoing a simple render farm hosted in Windows Azure using worker roles and the Azure Storage service. At the start of the presentation I deploy an Azure application that uses 16 worker roles to render a 1,500 frame 3D ray-traced animation. At the end of the presentation, when the animation was complete, I would play the animation delete the Azure deployment. The standing joke with the audience was that it was that it was a “$2 demo”, as the compute charges for running the 16 instances for an hour was $1.92, factor in the bandwidth charges and it’s a couple of dollars. The point of the demo is that it highlights one of the great benefits of cloud computing, you pay for what you use, and if you need massive compute power for a short period of time using Windows Azure can work out very cost effective. The “$2 demo” was great for presenting at user groups and conferences in that it could be deployed to Azure, used to render an animation, and then removed in a one hour session. I have always had the idea of doing something a bit more impressive with the demo, and scaling it from a “$2 demo” to a “$30 demo”. The challenge was to create a visually appealing animation in high definition format and keep the demo time down to one hour. This article will take a run through how I achieved this. Ray Tracing Ray tracing, a technique for generating high quality photorealistic images, gained popularity in the 90’s with companies like Pixar creating feature length computer animations, and also the emergence of shareware text-based ray tracers that could run on a home PC. In order to render a ray traced image, the ray of light that would pass from the view point must be tracked until it intersects with an object. At the intersection, the color, reflectiveness, transparency, and refractive index of the object are used to calculate if the ray will be reflected or refracted. Each pixel may require thousands of calculations to determine what color it will be in the rendered image. Pin-Board Toys Having very little artistic talent and a basic understanding of maths I decided to focus on an animation that could be modeled fairly easily and would look visually impressive. I’ve always liked the pin-board desktop toys that become popular in the 80’s and when I was working as a 3D animator back in the 90’s I always had the idea of creating a 3D ray-traced animation of a pin-board, but never found the energy to do it. Even if I had a go at it, the render time to produce an animation that would look respectable on a 486 would have been measured in months. PolyRay Back in 1995 I landed my first real job, after spending three years being a beach-ski-climbing-paragliding-bum, and was employed to create 3D ray-traced animations for a CD-ROM that school kids would use to learn physics. I had got into the strange and wonderful world of text-based ray tracing, and was using a shareware ray-tracer called PolyRay. PolyRay takes a text file describing a scene as input and, after a few hours processing on a 486, produced a high quality ray-traced image. The following is an example of a basic PolyRay scene file. background Midnight_Blue static define matte surface { ambient 0.1 diffuse 0.7 } define matte_white texture { matte { color white } } define matte_black texture { matte { color dark_slate_gray } } define position_cylindrical 3 define lookup_sawtooth 1 define light_wood <0.6, 0.24, 0.1> define median_wood <0.3, 0.12, 0.03> define dark_wood <0.05, 0.01, 0.005> define wooden texture { noise surface { ambient 0.2 diffuse 0.7 specular white, 0.5 microfacet Reitz 10 position_fn position_cylindrical position_scale 1 lookup_fn lookup_sawtooth octaves 1 turbulence 1 color_map( [0.0, 0.2, light_wood, light_wood] [0.2, 0.3, light_wood, median_wood] [0.3, 0.4, median_wood, light_wood] [0.4, 0.7, light_wood, light_wood] [0.7, 0.8, light_wood, median_wood] [0.8, 0.9, median_wood, light_wood] [0.9, 1.0, light_wood, dark_wood]) } } define glass texture { surface { ambient 0 diffuse 0 specular 0.2 reflection white, 0.1 transmission white, 1, 1.5 }} define shiny surface { ambient 0.1 diffuse 0.6 specular white, 0.6 microfacet Phong 7 } define steely_blue texture { shiny { color black } } define chrome texture { surface { color white ambient 0.0 diffuse 0.2 specular 0.4 microfacet Phong 10 reflection 0.8 } } viewpoint { from <4.000, -1.000, 1.000> at <0.000, 0.000, 0.000> up <0, 1, 0> angle 60 resolution 640, 480 aspect 1.6 image_format 0 } light <-10, 30, 20> light <-10, 30, -20> object { disc <0, -2, 0>, <0, 1, 0>, 30 wooden } object { sphere <0.000, 0.000, 0.000>, 1.00 chrome } object { cylinder <0.000, 0.000, 0.000>, <0.000, 0.000, -4.000>, 0.50 chrome } After setting up the background and defining colors and textures, the viewpoint is specified. The “camera” is located at a point in 3D space, and it looks towards another point. The angle, image resolution, and aspect ratio are specified. Two lights are present in the image at defined coordinates. The three objects in the image are a wooden disc to represent a table top, and a sphere and cylinder that intersect to form a pin that will be used for the pin board toy in the final animation. When the image is rendered, the following image is produced. The pins are modeled with a chrome surface, so they reflect the environment around them. Note that the scale of the pin shaft is not correct, this will be fixed later. Modeling the Pin Board The frame of the pin-board is made up of three boxes, and six cylinders, the front box is modeled using a clear, slightly reflective solid, with the same refractive index of glass. The other shapes are modeled as metal. object { box <-5.5, -1.5, 1>, <5.5, 5.5, 1.2> glass } object { box <-5.5, -1.5, -0.04>, <5.5, 5.5, -0.09> steely_blue } object { box <-5.5, -1.5, -0.52>, <5.5, 5.5, -0.59> steely_blue } object { cylinder <-5.2, -1.2, 1.4>, <-5.2, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <5.2, -1.2, 1.4>, <5.2, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <-5.2, 5.2, 1.4>, <-5.2, 5.2, -0.74>, 0.2 steely_blue } object { cylinder <5.2, 5.2, 1.4>, <5.2, 5.2, -0.74>, 0.2 steely_blue } object { cylinder <0, -1.2, 1.4>, <0, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <0, 5.2, 1.4>, <0, 5.2, -0.74>, 0.2 steely_blue } In order to create the matrix of pins that make up the pin board I used a basic console application with a few nested loops to create two intersecting matrixes of pins, which models the layout used in the pin boards. The resulting image is shown below. The pin board contains 11,481 pins, with the scene file containing 23,709 lines of code. For the complete animation 2,000 scene files will be created, which is over 47 million lines of code. Each pin in the pin-board will slide out a specific distance when an object is pressed into the back of the board. This is easily modeled by setting the Z coordinate of the pin to a specific value. In order to set all of the pins in the pin-board to the correct position, a bitmap image can be used. The position of the pin can be set based on the color of the pixel at the appropriate position in the image. When the Windows Azure logo is used to set the Z coordinate of the pins, the following image is generated. The challenge now was to make a cool animation. The Azure Logo is fine, but it is static. Using a normal video to animate the pins would not work; the colors in the video would not be the same as the depth of the objects from the camera. In order to simulate the pin board accurately a series of frames from a depth camera could be used. Windows Kinect The Kenect controllers for the X-Box 360 and Windows feature a depth camera. The Kinect SDK for Windows provides a programming interface for Kenect, providing easy access for .NET developers to the Kinect sensors. The Kinect Explorer provided with the Kinect SDK is a great starting point for exploring Kinect from a developers perspective. Both the X-Box 360 Kinect and the Windows Kinect will work with the Kinect SDK, the Windows Kinect is required for commercial applications, but the X-Box Kinect can be used for hobby projects. The Windows Kinect has the advantage of providing a mode to allow depth capture with objects closer to the camera, which makes for a more accurate depth image for setting the pin positions. Creating a Depth Field Animation The depth field animation used to set the positions of the pin in the pin board was created using a modified version of the Kinect Explorer sample application. In order to simulate the pin board accurately, a small section of the depth range from the depth sensor will be used. Any part of the object in front of the depth range will result in a white pixel; anything behind the depth range will be black. Within the depth range the pixels in the image will be set to RGB values from 0,0,0 to 255,255,255. A screen shot of the modified Kinect Explorer application is shown below. The Kinect Explorer sample application was modified to include slider controls that are used to set the depth range that forms the image from the depth stream. This allows the fine tuning of the depth image that is required for simulating the position of the pins in the pin board. The Kinect Explorer was also modified to record a series of images from the depth camera and save them as a sequence JPEG files that will be used to animate the pins in the animation the Start and Stop buttons are used to start and stop the image recording. En example of one of the depth images is shown below. Once a series of 2,000 depth images has been captured, the task of creating the animation can begin. Rendering a Test Frame In order to test the creation of frames and get an approximation of the time required to render each frame a test frame was rendered on-premise using PolyRay. The output of the rendering process is shown below. The test frame contained 23,629 primitive shapes, most of which are the spheres and cylinders that are used for the 11,800 or so pins in the pin board. The 1280x720 image contains 921,600 pixels, but as anti-aliasing was used the number of rays that were calculated was 4,235,777, with 3,478,754,073 object boundaries checked. The test frame of the pin board with the depth field image applied is shown below. The tracing time for the test frame was 4 minutes 27 seconds, which means rendering the2,000 frames in the animation would take over 148 hours, or a little over 6 days. Although this is much faster that an old 486, waiting almost a week to see the results of an animation would make it challenging for animators to create, view, and refine their animations. It would be much better if the animation could be rendered in less than one hour. Windows Azure Worker Roles The cost of creating an on-premise render farm to render animations increases in proportion to the number of servers. The table below shows the cost of servers for creating a render farm, assuming a cost of $500 per server. Number of Servers Cost 1 $500 16 $8,000 256 $128,000 As well as the cost of the servers, there would be additional costs for networking, racks etc. Hosting an environment of 256 servers on-premise would require a server room with cooling, and some pretty hefty power cabling. The Windows Azure compute services provide worker roles, which are ideal for performing processor intensive compute tasks. With the scalability available in Windows Azure a job that takes 256 hours to complete could be perfumed using different numbers of worker roles. The time and cost of using 1, 16 or 256 worker roles is shown below. Number of Worker Roles Render Time Cost 1 256 hours $30.72 16 16 hours $30.72 256 1 hour $30.72 Using worker roles in Windows Azure provides the same cost for the 256 hour job, irrespective of the number of worker roles used. Provided the compute task can be broken down into many small units, and the worker role compute power can be used effectively, it makes sense to scale the application so that the task is completed quickly, making the results available in a timely fashion. The task of rendering 2,000 frames in an animation is one that can easily be broken down into 2,000 individual pieces, which can be performed by a number of worker roles. Creating a Render Farm in Windows Azure The architecture of the render farm is shown in the following diagram. The render farm is a hybrid application with the following components: · On-Premise o Windows Kinect – Used combined with the Kinect Explorer to create a stream of depth images. o Animation Creator – This application uses the depth images from the Kinect sensor to create scene description files for PolyRay. These files are then uploaded to the jobs blob container, and job messages added to the jobs queue. o Process Monitor – This application queries the role instance lifecycle table and displays statistics about the render farm environment and render process. o Image Downloader – This application polls the image queue and downloads the rendered animation files once they are complete. · Windows Azure o Azure Storage – Queues and blobs are used for the scene description files and completed frames. A table is used to store the statistics about the rendering environment. The architecture of each worker role is shown below. The worker role is configured to use local storage, which provides file storage on the worker role instance that can be use by the applications to render the image and transform the format of the image. The service definition for the worker role with the local storage configuration highlighted is shown below. <?xml version="1.0" encoding="utf-8"?> <ServiceDefinition name="CloudRay" > <WorkerRole name="CloudRayWorkerRole" vmsize="Small"> <Imports> </Imports> <ConfigurationSettings> <Setting name="DataConnectionString" /> </ConfigurationSettings> <LocalResources> <LocalStorage name="RayFolder" cleanOnRoleRecycle="true" /> </LocalResources> </WorkerRole> </ServiceDefinition> The two executable programs, PolyRay.exe and DTA.exe are included in the Azure project, with Copy Always set as the property. PolyRay will take the scene description file and render it to a Truevision TGA file. As the TGA format has not seen much use since the mid 90’s it is converted to a JPG image using Dave's Targa Animator, another shareware application from the 90’s. Each worker roll will use the following process to render the animation frames. 1. The worker process polls the job queue, if a job is available the scene description file is downloaded from blob storage to local storage. 2. PolyRay.exe is started in a process with the appropriate command line arguments to render the image as a TGA file. 3. DTA.exe is started in a process with the appropriate command line arguments convert the TGA file to a JPG file. 4. The JPG file is uploaded from local storage to the images blob container. 5. A message is placed on the images queue to indicate a new image is available for download. 6. The job message is deleted from the job queue. 7. The role instance lifecycle table is updated with statistics on the number of frames rendered by the worker role instance, and the CPU time used. The code for this is shown below. public override void Run() { // Set environment variables string polyRayPath = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), PolyRayLocation); string dtaPath = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), DTALocation); LocalResource rayStorage = RoleEnvironment.GetLocalResource("RayFolder"); string localStorageRootPath = rayStorage.RootPath; JobQueue jobQueue = new JobQueue("renderjobs"); JobQueue downloadQueue = new JobQueue("renderimagedownloadjobs"); CloudRayBlob sceneBlob = new CloudRayBlob("scenes"); CloudRayBlob imageBlob = new CloudRayBlob("images"); RoleLifecycleDataSource roleLifecycleDataSource = new RoleLifecycleDataSource(); Frames = 0; while (true) { // Get the render job from the queue CloudQueueMessage jobMsg = jobQueue.Get(); if (jobMsg != null) { // Get the file details string sceneFile = jobMsg.AsString; string tgaFile = sceneFile.Replace(".pi", ".tga"); string jpgFile = sceneFile.Replace(".pi", ".jpg"); string sceneFilePath = Path.Combine(localStorageRootPath, sceneFile); string tgaFilePath = Path.Combine(localStorageRootPath, tgaFile); string jpgFilePath = Path.Combine(localStorageRootPath, jpgFile); // Copy the scene file to local storage sceneBlob.DownloadFile(sceneFilePath); // Run the ray tracer. string polyrayArguments = string.Format("\"{0}\" -o \"{1}\" -a 2", sceneFilePath, tgaFilePath); Process polyRayProcess = new Process(); polyRayProcess.StartInfo.FileName = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), polyRayPath); polyRayProcess.StartInfo.Arguments = polyrayArguments; polyRayProcess.Start(); polyRayProcess.WaitForExit(); // Convert the image string dtaArguments = string.Format(" {0} /FJ /P{1}", tgaFilePath, Path.GetDirectoryName (jpgFilePath)); Process dtaProcess = new Process(); dtaProcess.StartInfo.FileName = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), dtaPath); dtaProcess.StartInfo.Arguments = dtaArguments; dtaProcess.Start(); dtaProcess.WaitForExit(); // Upload the image to blob storage imageBlob.UploadFile(jpgFilePath); // Add a download job. downloadQueue.Add(jpgFile); // Delete the render job message jobQueue.Delete(jobMsg); Frames++; } else { Thread.Sleep(1000); } // Log the worker role activity. roleLifecycleDataSource.Alive ("CloudRayWorker", RoleLifecycleDataSource.RoleLifecycleId, Frames); } } Monitoring Worker Role Instance Lifecycle In order to get more accurate statistics about the lifecycle of the worker role instances used to render the animation data was tracked in an Azure storage table. The following class was used to track the worker role lifecycles in Azure storage. public class RoleLifecycle : TableServiceEntity { public string ServerName { get; set; } public string Status { get; set; } public DateTime StartTime { get; set; } public DateTime EndTime { get; set; } public long SecondsRunning { get; set; } public DateTime LastActiveTime { get; set; } public int Frames { get; set; } public string Comment { get; set; } public RoleLifecycle() { } public RoleLifecycle(string roleName) { PartitionKey = roleName; RowKey = Utils.GetAscendingRowKey(); Status = "Started"; StartTime = DateTime.UtcNow; LastActiveTime = StartTime; EndTime = StartTime; SecondsRunning = 0; Frames = 0; } } A new instance of this class is created and added to the storage table when the role starts. It is then updated each time the worker renders a frame to record the total number of frames rendered and the total processing time. These statistics are used be the monitoring application to determine the effectiveness of use of resources in the render farm. Rendering the Animation The Azure solution was deployed to Windows Azure with the service configuration set to 16 worker role instances. This allows for the application to be tested in the cloud environment, and the performance of the application determined. When I demo the application at conferences and user groups I often start with 16 instances, and then scale up the application to the full 256 instances. The configuration to run 16 instances is shown below. <?xml version="1.0" encoding="utf-8"?> <ServiceConfiguration serviceName="CloudRay" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="1" osVersion="*"> <Role name="CloudRayWorkerRole"> <Instances count="16" /> <ConfigurationSettings> <Setting name="DataConnectionString" value="DefaultEndpointsProtocol=https;AccountName=cloudraydata;AccountKey=..." /> </ConfigurationSettings> </Role> </ServiceConfiguration> About six minutes after deploying the application the first worker roles become active and start to render the first frames of the animation. The CloudRay Monitor application displays an icon for each worker role instance, with a number indicating the number of frames that the worker role has rendered. The statistics on the left show the number of active worker roles and statistics about the render process. The render time is the time since the first worker role became active; the CPU time is the total amount of processing time used by all worker role instances to render the frames. Five minutes after the first worker role became active the last of the 16 worker roles activated. By this time the first seven worker roles had each rendered one frame of the animation. With 16 worker roles u and running it can be seen that one hour and 45 minutes CPU time has been used to render 32 frames with a render time of just under 10 minutes. At this rate it would take over 10 hours to render the 2,000 frames of the full animation. In order to complete the animation in under an hour more processing power will be required. Scaling the render farm from 16 instances to 256 instances is easy using the new management portal. The slider is set to 256 instances, and the configuration saved. We do not need to re-deploy the application, and the 16 instances that are up and running will not be affected. Alternatively, the configuration file for the Azure service could be modified to specify 256 instances. <?xml version="1.0" encoding="utf-8"?> <ServiceConfiguration serviceName="CloudRay" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="1" osVersion="*"> <Role name="CloudRayWorkerRole"> <Instances count="256" /> <ConfigurationSettings> <Setting name="DataConnectionString" value="DefaultEndpointsProtocol=https;AccountName=cloudraydata;AccountKey=..." /> </ConfigurationSettings> </Role> </ServiceConfiguration> Six minutes after the new configuration has been applied 75 new worker roles have activated and are processing their first frames. Five minutes later the full configuration of 256 worker roles is up and running. We can see that the average rate of frame rendering has increased from 3 to 12 frames per minute, and that over 17 hours of CPU time has been utilized in 23 minutes. In this test the time to provision 140 worker roles was about 11 minutes, which works out at about one every five seconds. We are now half way through the rendering, with 1,000 frames complete. This has utilized just under three days of CPU time in a little over 35 minutes. The animation is now complete, with 2,000 frames rendered in a little over 52 minutes. The CPU time used by the 256 worker roles is 6 days, 7 hours and 22 minutes with an average frame rate of 38 frames per minute. The rendering of the last 1,000 frames took 16 minutes 27 seconds, which works out at a rendering rate of 60 frames per minute. The frame counts in the server instances indicate that the use of a queue to distribute the workload has been very effective in distributing the load across the 256 worker role instances. The first 16 instances that were deployed first have rendered between 11 and 13 frames each, whilst the 240 instances that were added when the application was scaled have rendered between 6 and 9 frames each. Completed Animation I’ve uploaded the completed animation to YouTube, a low resolution preview is shown below. Pin Board Animation Created using Windows Kinect and 256 Windows Azure Worker Roles The animation can be viewed in 1280x720 resolution at the following link: http://www.youtube.com/watch?v=n5jy6bvSxWc Effective Use of Resources According to the CloudRay monitor statistics the animation took 6 days, 7 hours and 22 minutes CPU to render, this works out at 152 hours of compute time, rounded up to the nearest hour. As the usage for the worker role instances are billed for the full hour, it may have been possible to render the animation using fewer than 256 worker roles. When deciding the optimal usage of resources, the time required to provision and start the worker roles must also be considered. In the demo I started with 16 worker roles, and then scaled the application to 256 worker roles. It would have been more optimal to start the application with maybe 200 worker roles, and utilized the full hour that I was being billed for. This would, however, have prevented showing the ease of scalability of the application. The new management portal displays the CPU usage across the worker roles in the deployment. The average CPU usage across all instances is 93.27%, with over 99% used when all the instances are up and running. This shows that the worker role resources are being used very effectively. Grid Computing Scenarios Although I am using this scenario for a hobby project, there are many scenarios where a large amount of compute power is required for a short period of time. Windows Azure provides a great platform for developing these types of grid computing applications, and can work out very cost effective. · Windows Azure can provide massive compute power, on demand, in a matter of minutes. · The use of queues to manage the load balancing of jobs between role instances is a simple and effective solution. · Using a cloud-computing platform like Windows Azure allows proof-of-concept scenarios to be tested and evaluated on a very low budget. · No charges for inbound data transfer makes the uploading of large data sets to Windows Azure Storage services cost effective. (Transaction charges still apply.) Tips for using Windows Azure for Grid Computing Scenarios I found the implementation of a render farm using Windows Azure a fairly simple scenario to implement. I was impressed by ease of scalability that Azure provides, and by the short time that the application took to scale from 16 to 256 worker role instances. In this case it was around 13 minutes, in other tests it took between 10 and 20 minutes. The following tips may be useful when implementing a grid computing project in Windows Azure. · Using an Azure Storage queue to load-balance the units of work across multiple worker roles is simple and very effective. The design I have used in this scenario could easily scale to many thousands of worker role instances. · Windows Azure accounts are typically limited to 20 cores. If you need to use more than this, a call to support and a credit card check will be required. · Be aware of how the billing model works. You will be charged for worker role instances for the full clock our in which the instance is deployed. Schedule the workload to start just after the clock hour has started. · Monitor the utilization of the resources you are provisioning, ensure that you are not paying for worker roles that are idle. · If you are deploying third party applications to worker roles, you may well run into licensing issues. Purchasing software licenses on a per-processor basis when using hundreds of processors for a short time period would not be cost effective. · Third party software may also require installation onto the worker roles, which can be accomplished using start-up tasks. Bear in mind that adding a startup task and possible re-boot will add to the time required for the worker role instance to start and activate. An alternative may be to use a prepared VM and use VM roles. · Consider using the Windows Azure Autoscaling Application Block (WASABi) to autoscale the worker roles in your application. When using a large number of worker roles, the utilization must be carefully monitored, if the scaling algorithms are not optimal it could get very expensive!

Read the article
ANTS Memory Profiler 7.0 Review

- by Michael B. McLaughlin

(This is my first review as a part of the GeeksWithBlogs.net Influencers program. It’s a program in which I (and the others who have been selected for it) get the opportunity to check out new products and services and write reviews about them. We don’t get paid for this, but we do generally get to keep a copy of the software or retain an account for some period of time on the service that we review. In this case I received a copy of Red Gate Software’s ANTS Memory Profiler 7.0, which was released in January. I don’t have any upgrade rights nor is my review guided, restrained, influenced, or otherwise controlled by Red Gate or anyone else. But I do get to keep the software license. I will always be clear about what I received whenever I do a review – I leave it up to you to decide whether you believe I can be objective. I believe I can be. If I used something and really didn’t like it, keeping a copy of it wouldn’t be worth anything to me. In that case though, I would simply uninstall/deactivate/whatever the software or service and tell the company what I didn’t like about it so they could (hopefully) make it better in the future. I don’t think it’d be polite to write up a terrible review, nor do I think it would be a particularly good use of my time. There are people who get paid for a living to review things, so I leave it to them to tell you what they think is bad and why. I’ll only spend my time telling you about things I think are good.) Overview of Common .NET Memory Problems When coming to land of managed memory from the wilds of unmanaged code, it’s easy to say to one’s self, “Wow! Now I never have to worry about memory problems again!” But this simply isn’t true. Managed code environments, such as .NET, make many, many things easier. You will never have to worry about memory corruption due to a bad pointer, for example (unless you’re working with unsafe code, of course). But managed code has its own set of memory concerns. For example, failing to unsubscribe from events when you are done with them leaves the publisher of an event with a reference to the subscriber. If you eliminate all your own references to the subscriber, then that memory is effectively lost since the GC won’t delete it because of the publishing object’s reference. When the publishing object itself becomes subject to garbage collection then you’ll get that memory back finally, but that could take a very long time depending of the life of the publisher. Another common source of resource leaks is failing to properly release unmanaged resources. When writing a class that contains members that hold unmanaged resources (e.g. any of the Stream-derived classes, IsolatedStorageFile, most classes ending in “Reader” or “Writer”), you should always implement IDisposable, making sure to use a properly written Dispose method. And when you are using an instance of a class that implements IDisposable, you should always make sure to use a 'using' statement in order to ensure that the object’s unmanaged resources are disposed of properly. (A ‘using’ statement is a nicer, cleaner looking, and easier to use version of a try-finally block. The compiler actually translates it as though it were a try-finally block. Note that Code Analysis warning 2202 (CA2202) will often be triggered by nested using blocks. A properly written dispose method ensures that it only runs once such that calling dispose multiple times should not be a problem. Nonetheless, CA2202 exists and if you want to avoid triggering it then you should write your code such that only the innermost IDisposable object uses a ‘using’ statement, with any outer code making use of appropriate try-finally blocks instead). Then, of course, there are situations where you are operating in a memory-constrained environment or else you want to limit or even eliminate allocations within a certain part of your program (e.g. within the main game loop of an XNA game) in order to avoid having the GC run. On the Xbox 360 and Windows Phone 7, for example, for every 1 MB of heap allocations you make, the GC runs; the added time of a GC collection can cause a game to drop frames or run slowly thereby making it look bad. Eliminating allocations (or else minimizing them and calling an explicit Collect at an appropriate time) is a common way of avoiding this (the other way is to simplify your heap so that the GC’s latency is low enough not to cause performance issues). ANTS Memory Profiler 7.0 When the opportunity to review Red Gate’s recently released ANTS Memory Profiler 7.0 arose, I jumped at it. In order to review it, I was given a free copy (which does not include upgrade rights for future versions) which I am allowed to keep. For those of you who are familiar with ANTS Memory Profiler, you can find a list of new features and enhancements here. If you are an experienced .NET developer who is familiar with .NET memory management issues, ANTS Memory Profiler is great. More importantly still, if you are new to .NET development or you have no experience or limited experience with memory profiling, ANTS Memory Profiler is awesome. From the very beginning, it guides you through the process of memory profiling. If you’re experienced and just want dive in however, it doesn’t get in your way. The help items GAHSFLASHDAJLDJA are well designed and located right next to the UI controls so that they are easy to find without being intrusive. When you first launch it, it presents you with a “Getting Started” screen that contains links to “Memory profiling video tutorials”, “Strategies for memory profiling”, and the “ANTS Memory Profiler forum”. I’m normally the kind of person who looks at a screen like that only to find the “Don’t show this again” checkbox. Since I was doing a review, though, I decided I should examine them. I was pleasantly surprised. The overview video clocks in at three minutes and fifty seconds. It begins by showing you how to get started profiling an application. It explains that profiling is done by taking memory snapshots periodically while your program is running and then comparing them. ANTS Memory Profiler (I’m just going to call it “ANTS MP” from here) analyzes these snapshots in the background while your application is running. It briefly mentions a new feature in Version 7, a new API that give you the ability to trigger snapshots from within your application’s source code (more about this below). You can also, and this is the more common way you would do it, take a memory snapshot at any time from within the ANTS MP window by clicking the “Take Memory Snapshot” button in the upper right corner. The overview video goes on to demonstrate a basic profiling session on an application that pulls information from a database and displays it. It shows how to switch which snapshots you are comparing, explains the different sections of the Summary view and what they are showing, and proceeds to show you how to investigate memory problems using the “Instance Categorizer” to track the path from an object (or set of objects) to the GC’s root in order to find what things along the path are holding a reference to it/them. For a set of objects, you can then click on it and get the “Instance List” view. This displays all of the individual objects (including their individual sizes, values, etc.) of that type which share the same path to the GC root. You can then click on one of the objects to generate an “Instance Retention Graph” view. This lets you track directly up to see the reference chain for that individual object. In the overview video, it turned out that there was an event handler which was holding on to a reference, thereby keeping a large number of strings that should have been freed in memory. Lastly the video shows the “Class List” view, which lets you dig in deeply to find problems that might not have been clear when following the previous workflow. Once you have at least one memory snapshot you can begin analyzing. The main interface is in the “Analysis” tab. You can also switch to the “Session Overview” tab, which gives you several bar charts highlighting basic memory data about the snapshots you’ve taken. If you hover over the individual bars (and the individual colors in bars that have more than one), you will see a detailed text description of what the bar is representing visually. The Session Overview is good for a quick summary of memory usage and information about the different heaps. You are going to spend most of your time in the Analysis tab, but it’s good to remember that the Session Overview is there to give you some quick feedback on basic memory usage stats. As described above in the summary of the overview video, there is a certain natural workflow to the Analysis tab. You’ll spin up your application and take some snapshots at various times such as before and after clicking a button to open a window or before and after closing a window. Taking these snapshots lets you examine what is happening with memory. You would normally expect that a lot of memory would be freed up when closing a window or exiting a document. By taking snapshots before and after performing an action like that you can see whether or not the memory is really being freed. If you already know an area that’s giving you trouble, you can run your application just like normal until just before getting to that part and then you can take a few strategic snapshots that should help you pin down the problem. Something the overview didn’t go into is how to use the “Filters” section at the bottom of ANTS MP together with the Class List view in order to narrow things down. The video tutorials page has a nice 3 minute intro video called “How to use the filters”. It’s a nice introduction and covers some of the basics. I’m going to cover a bit more because I think they’re a really neat, really helpful feature. Large programs can bring up thousands of classes. Even simple programs can instantiate far more classes than you might realize. In a basic .NET 4 WPF application for example (and when I say basic, I mean just MainWindow.xaml with a button added to it), the unfiltered Class List view will have in excess of 1000 classes (my simple test app had anywhere from 1066 to 1148 classes depending on which snapshot I was using as the “Current” snapshot). This is amazing in some ways as it shows you how in stark detail just how immensely powerful the WPF framework is. But hunting through 1100 classes isn’t productive, no matter how cool it is that there are that many classes instantiated and doing all sorts of awesome things. Let’s say you wanted to examine just the classes your application contains source code for (in my simple example, that would be the MainWindow and App). Under “Basic Filters”, click on “Classes with source” under “Show only…”. Voilà. Down from 1070 classes in the snapshot I was using as “Current” to 2 classes. If you then click on a class’s name, it will show you (to the right of the class name) two little icon buttons. Hover over them and you will see that you can click one to view the Instance Categorizer for the class and another to view the Instance List for the class. You can also show classes based on which heap they live on. If you chose both a Baseline snapshot and a Current snapshot then you can use the “Comparing snapshots” filters to show only: “New objects”; “Surviving objects”; “Survivors in growing classes”; or “Zombie objects” (if you aren’t sure what one of these means, you can click the helpful “?” in a green circle icon to bring up a popup that explains them and provides context). Remember that your selection(s) under the “Show only…” heading will still apply, so you should update those selections to make sure you are seeing the view you want. There are also links under the “What is my memory problem?” heading that can help you diagnose the problems you are seeing including one for “I don’t know which kind I have” for situations where you know generally that your application has some problems but aren’t sure what the behavior you have been seeing (OutOfMemoryExceptions, continually growing memory usage, larger memory use than expected at certain points in the program). The Basic Filters are not the only filters there are. “Filter by Object Type” gives you the ability to filter by: “Objects that are disposable”; “Objects that are/are not disposed”; “Objects that are/are not GC roots” (GC roots are things like static variables); and “Objects that implement _______”. “Objects that implement” is particularly neat. Once you check the box, you can then add one or more classes and interfaces that an object must implement in order to survive the filtering. Lastly there is “Filter by Reference”, which gives you the option to pare down the list based on whether an object is “Kept in memory exclusively by” a particular item, a class/interface, or a namespace; whether an object is “Referenced by” one or more of those choices; and whether an object is “Never referenced by” one or more of those choices. Remember that filtering is cumulative, so anything you had set in one of the filter sections still remains in effect unless and until you go back and change it. There’s quite a bit more to ANTS MP – it’s a very full featured product – but I think I touched on all of the most significant pieces. You can use it to debug: a .NET executable; an ASP.NET web application (running on IIS); an ASP.NET web application (running on Visual Studio’s built-in web development server); a Silverlight 4 browser application; a Windows service; a COM+ server; and even something called an XBAP (local XAML browser application). You can also attach to a .NET 4 process to profile an application that’s already running. The startup screen also has a large number of “Charting Options” that let you adjust which statistics ANTS MP should collect. The default selection is a good, minimal set. It’s worth your time to browse through the charting options to examine other statistics that may also help you diagnose a particular problem. The more statistics ANTS MP collects, the longer it will take to collect statistics. So just turning everything on is probably a bad idea. But the option to selectively add in additional performance counters from the extensive list could be a very helpful thing for your memory profiling as it lets you see additional data that might provide clues about a particular problem that has been bothering you. ANTS MP integrates very nicely with all versions of Visual Studio that support plugins (i.e. all of the non-Express versions). Just note that if you choose “Profile Memory” from the “ANTS” menu that it will launch profiling for whichever project you have set as the Startup project. One quick tip from my experience so far using ANTS MP: if you want to properly understand your memory usage in an application you’ve written, first create an “empty” version of the type of project you are going to profile (a WPF application, an XNA game, etc.) and do a quick profiling session on that so that you know the baseline memory usage of the framework itself. By “empty” I mean just create a new project of that type in Visual Studio then compile it and run it with profiling – don’t do anything special or add in anything (except perhaps for any external libraries you’re planning to use). The first thing I tried ANTS MP out on was a demo XNA project of an editor that I’ve been working on for quite some time that involves a custom extension to XNA’s content pipeline. The first time I ran it and saw the unmanaged memory usage I was convinced I had some horrible bug that was creating extra copies of texture data (the demo project didn’t have a lot of texture data so when I saw a lot of unmanaged memory I instantly figured I was doing something wrong). Then I thought to run an empty project through and when I saw that the amount of unmanaged memory was virtually identical, it dawned on me that the CLR itself sits in unmanaged memory and that (thankfully) there was nothing wrong with my code! Quite a relief. Earlier, when discussing the overview video, I mentioned the API that lets you take snapshots from within your application. I gave it a quick trial and it’s very easy to integrate and make use of and is a really nice addition (especially for projects where you want to know what, if any, allocations there are in a specific, complicated section of code). The only concern I had was that if I hadn’t watched the overview video I might never have known it existed. Even then it took me five minutes of hunting around Red Gate’s website before I found the “Taking snapshots from your code" article that explains what DLL you need to add as a reference and what method of what class you should call in order to take an automatic snapshot (including the helpful warning to wrap it in a try-catch block since, under certain circumstances, it can raise an exception, such as trying to call it more than 5 times in 30 seconds. The difficulty in discovering and then finding information about the automatic snapshots API was one thing I thought could use improvement. Another thing I think would make it even better would be local copies of the webpages it links to. Although I’m generally always connected to the internet, I imagine there are more than a few developers who aren’t or who are behind very restrictive firewalls. For them (and for me, too, if my internet connection happens to be down), it would be nice to have those documents installed locally or to have the option to download an additional “documentation” package that would add local copies. Another thing that I wish could be easier to manage is the Filters area. Finding and setting individual filters is very easy as is understanding what those filter do. And breaking it up into three sections (basic, by object, and by reference) makes sense. But I could easily see myself running a long profiling session and forgetting that I had set some filter a long while earlier in a different filter section and then spending quite a bit of time trying to figure out why some problem that was clearly visible in the data wasn’t showing up in, e.g. the instance list before remembering to check all the filters for that one setting that was only culling a few things from view. Some sort of indicator icon next to the filter section names that appears you have at least one filter set in that area would be a nice visual clue to remind me that “oh yeah, I told it to only show objects on the Gen 2 heap! That’s why I’m not seeing those instances of the SuperMagic class!” Something that would be nice (but that Red Gate cannot really do anything about) would be if this could be used in Windows Phone 7 development. If Microsoft and Red Gate could work together to make this happen (even if just on the WP7 emulator), that would be amazing. Especially given the memory constraints that apps and games running on mobile devices need to work within, a good memory profiler would be a phenomenally helpful tool. If anyone at Microsoft reads this, it’d be really great if you could make something like that happen. Perhaps even a (subsidized) custom version just for WP7 development. (For XNA games, of course, you can create a Windows version of the game and use ANTS MP on the Windows version in order to get a better picture of your memory situation. For Silverlight on WP7, though, there’s quite a bit of educated guess work and WeakReference creation followed by forced collections in order to find the source of a memory problem.) The only other thing I found myself wanting was a “Back” button. Between my Windows Phone 7, Zune, and other things, I’ve grown very used to having a “back stack” that lets me just navigate back to where I came from. The ANTS MP interface is surprisingly easy to use given how much it lets you do, and once you start using it for any amount of time, you learn all of the different areas such that you know where to go. And it does remember the state of the areas you were previously in, of course. So if you go to, e.g., the Instance Retention Graph from the Class List and then return back to the Class List, it will remember which class you had selected and all that other state information. Still, a “Back” button would be a welcome addition to a future release. Bottom Line ANTS Memory Profiler is not an inexpensive tool. But my time is valuable. I can easily see ANTS MP saving me enough time tracking down memory problems to justify it on a cost basis. More importantly to me, knowing what is happening memory-wise in my programs and having the confidence that my code doesn’t have any hidden time bombs in it that will cause it to OOM if I leave it running for longer than I do when I spin it up real quickly for debugging or just to see how a new feature looks and feels is a good feeling. It’s a feeling that I like having and want to continue to have. I got the current version for free in order to review it. Having done so, I’ve now added it to my must-have tools and will gladly lay out the money for the next version when it comes out. It has a 14 day free trial, so if you aren’t sure if it’s right for you or if you think it seems interesting but aren’t really sure if it’s worth shelling out the money for it, give it a try.

Read the article
Using R to Analyze G1GC Log Files

- by user12620111

Using R to Analyze G1GC Log Files body, td { font-family: sans-serif; background-color: white; font-size: 12px; margin: 8px; } tt, code, pre { font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace; } h1 { font-size:2.2em; } h2 { font-size:1.8em; } h3 { font-size:1.4em; } h4 { font-size:1.0em; } h5 { font-size:0.9em; } h6 { font-size:0.8em; } a:visited { color: rgb(50%, 0%, 50%); } pre { margin-top: 0; max-width: 95%; border: 1px solid #ccc; white-space: pre-wrap; } pre code { display: block; padding: 0.5em; } code.r, code.cpp { background-color: #F8F8F8; } table, td, th { border: none; } blockquote { color:#666666; margin:0; padding-left: 1em; border-left: 0.5em #EEE solid; } hr { height: 0px; border-bottom: none; border-top-width: thin; border-top-style: dotted; border-top-color: #999999; } @media print { * { background: transparent !important; color: black !important; filter:none !important; -ms-filter: none !important; } body { font-size:12pt; max-width:100%; } a, a:visited { text-decoration: underline; } hr { visibility: hidden; page-break-before: always; } pre, blockquote { padding-right: 1em; page-break-inside: avoid; } tr, img { page-break-inside: avoid; } img { max-width: 100% !important; } @page :left { margin: 15mm 20mm 15mm 10mm; } @page :right { margin: 15mm 10mm 15mm 20mm; } p, h2, h3 { orphans: 3; widows: 3; } h2, h3 { page-break-after: avoid; } } pre .operator, pre .paren { color: rgb(104, 118, 135) } pre .literal { color: rgb(88, 72, 246) } pre .number { color: rgb(0, 0, 205); } pre .comment { color: rgb(76, 136, 107); } pre .keyword { color: rgb(0, 0, 255); } pre .identifier { color: rgb(0, 0, 0); } pre .string { color: rgb(3, 106, 7); } var hljs=new function(){function m(p){return p.replace(/&/gm,"&").replace(/"}while(y.length||w.length){var v=u().splice(0,1)[0];z+=m(x.substr(q,v.offset-q));q=v.offset;if(v.event=="start"){z+=t(v.node);s.push(v.node)}else{if(v.event=="stop"){var p,r=s.length;do{r--;p=s[r];z+=("")}while(p!=v.node);s.splice(r,1);while(r'+M[0]+""}else{r+=M[0]}O=P.lR.lastIndex;M=P.lR.exec(L)}return r+L.substr(O,L.length-O)}function J(L,M){if(M.sL&&e[M.sL]){var r=d(M.sL,L);x+=r.keyword_count;return r.value}else{return F(L,M)}}function I(M,r){var L=M.cN?'':"";if(M.rB){y+=L;M.buffer=""}else{if(M.eB){y+=m(r)+L;M.buffer=""}else{y+=L;M.buffer=r}}D.push(M);A+=M.r}function G(N,M,Q){var R=D[D.length-1];if(Q){y+=J(R.buffer+N,R);return false}var P=q(M,R);if(P){y+=J(R.buffer+N,R);I(P,M);return P.rB}var L=v(D.length-1,M);if(L){var O=R.cN?"":"";if(R.rE){y+=J(R.buffer+N,R)+O}else{if(R.eE){y+=J(R.buffer+N,R)+O+m(M)}else{y+=J(R.buffer+N+M,R)+O}}while(L1){O=D[D.length-2].cN?"":"";y+=O;L--;D.length--}var r=D[D.length-1];D.length--;D[D.length-1].buffer="";if(r.starts){I(r.starts,"")}return R.rE}if(w(M,R)){throw"Illegal"}}var E=e[B];var D=[E.dM];var A=0;var x=0;var y="";try{var s,u=0;E.dM.buffer="";do{s=p(C,u);var t=G(s[0],s[1],s[2]);u+=s[0].length;if(!t){u+=s[1].length}}while(!s[2]);if(D.length1){throw"Illegal"}return{r:A,keyword_count:x,value:y}}catch(H){if(H=="Illegal"){return{r:0,keyword_count:0,value:m(C)}}else{throw H}}}function g(t){var p={keyword_count:0,r:0,value:m(t)};var r=p;for(var q in e){if(!e.hasOwnProperty(q)){continue}var s=d(q,t);s.language=q;if(s.keyword_count+s.rr.keyword_count+r.r){r=s}if(s.keyword_count+s.rp.keyword_count+p.r){r=p;p=s}}if(r.language){p.second_best=r}return p}function i(r,q,p){if(q){r=r.replace(/^((]+|\t)+)/gm,function(t,w,v,u){return w.replace(/\t/g,q)})}if(p){r=r.replace(/\n/g,"")}return r}function n(t,w,r){var x=h(t,r);var v=a(t);var y,s;if(v){y=d(v,x)}else{return}var q=c(t);if(q.length){s=document.createElement("pre");s.innerHTML=y.value;y.value=k(q,c(s),x)}y.value=i(y.value,w,r);var u=t.className;if(!u.match("(\\s|^)(language-)?"+v+"(\\s|$)")){u=u?(u+" "+v):v}if(/MSIE [678]/.test(navigator.userAgent)&&t.tagName=="CODE"&&t.parentNode.tagName=="PRE"){s=t.parentNode;var p=document.createElement("div");p.innerHTML=""+y.value+"";t=p.firstChild.firstChild;p.firstChild.cN=s.cN;s.parentNode.replaceChild(p.firstChild,s)}else{t.innerHTML=y.value}t.className=u;t.result={language:v,kw:y.keyword_count,re:y.r};if(y.second_best){t.second_best={language:y.second_best.language,kw:y.second_best.keyword_count,re:y.second_best.r}}}function o(){if(o.called){return}o.called=true;var r=document.getElementsByTagName("pre");for(var p=0;p|=||=||=|\\?|\\[|\\{|\\(|\\^|\\^=|\\||\\|=|\\|\\||~";this.ER="(?![\\s\\S])";this.BE={b:"\\\\.",r:0};this.ASM={cN:"string",b:"'",e:"'",i:"\\n",c:[this.BE],r:0};this.QSM={cN:"string",b:'"',e:'"',i:"\\n",c:[this.BE],r:0};this.CLCM={cN:"comment",b:"//",e:"$"};this.CBLCLM={cN:"comment",b:"/\\*",e:"\\*/"};this.HCM={cN:"comment",b:"#",e:"$"};this.NM={cN:"number",b:this.NR,r:0};this.CNM={cN:"number",b:this.CNR,r:0};this.BNM={cN:"number",b:this.BNR,r:0};this.inherit=function(r,s){var p={};for(var q in r){p[q]=r[q]}if(s){for(var q in s){p[q]=s[q]}}return p}}();hljs.LANGUAGES.cpp=function(){var a={keyword:{"false":1,"int":1,"float":1,"while":1,"private":1,"char":1,"catch":1,"export":1,virtual:1,operator:2,sizeof:2,dynamic_cast:2,typedef:2,const_cast:2,"const":1,struct:1,"for":1,static_cast:2,union:1,namespace:1,unsigned:1,"long":1,"throw":1,"volatile":2,"static":1,"protected":1,bool:1,template:1,mutable:1,"if":1,"public":1,friend:2,"do":1,"return":1,"goto":1,auto:1,"void":2,"enum":1,"else":1,"break":1,"new":1,extern:1,using:1,"true":1,"class":1,asm:1,"case":1,typeid:1,"short":1,reinterpret_cast:2,"default":1,"double":1,register:1,explicit:1,signed:1,typename:1,"try":1,"this":1,"switch":1,"continue":1,wchar_t:1,inline:1,"delete":1,alignof:1,char16_t:1,char32_t:1,constexpr:1,decltype:1,noexcept:1,nullptr:1,static_assert:1,thread_local:1,restrict:1,_Bool:1,complex:1},built_in:{std:1,string:1,cin:1,cout:1,cerr:1,clog:1,stringstream:1,istringstream:1,ostringstream:1,auto_ptr:1,deque:1,list:1,queue:1,stack:1,vector:1,map:1,set:1,bitset:1,multiset:1,multimap:1,unordered_set:1,unordered_map:1,unordered_multiset:1,unordered_multimap:1,array:1,shared_ptr:1}};return{dM:{k:a,i:"",k:a,r:10,c:["self"]}]}}}();hljs.LANGUAGES.r={dM:{c:[hljs.HCM,{cN:"number",b:"\\b0[xX][0-9a-fA-F]+[Li]?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+(?:[eE][+\\-]?\\d*)?L\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+\\.(?!\\d)(?:i\\b)?",e:hljs.IMMEDIATE_RE,r:1},{cN:"number",b:"\\b\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"keyword",b:"(?:tryCatch|library|setGeneric|setGroupGeneric)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\.",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\d+(?![\\w.])",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\b(?:function)",e:hljs.IMMEDIATE_RE,r:2},{cN:"keyword",b:"(?:if|in|break|next|repeat|else|for|return|switch|while|try|stop|warning|require|attach|detach|source|setMethod|setClass)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"literal",b:"(?:NA|NA_integer_|NA_real_|NA_character_|NA_complex_)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"literal",b:"(?:NULL|TRUE|FALSE|T|F|Inf|NaN)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"identifier",b:"[a-zA-Z.][a-zA-Z0-9._]*\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"operator",b:"|=|| Using R to Analyze G1GC Log Files Using R to Analyze G1GC Log Files Introduction Working in Oracle Platform Integration gives an engineer opportunities to work on a wide array of technologies. My team’s goal is to make Oracle applications run best on the Solaris/SPARC platform. When looking for bottlenecks in a modern applications, one needs to be aware of not only how the CPUs and operating system are executing, but also network, storage, and in some cases, the Java Virtual Machine. I was recently presented with about 1.5 GB of Java Garbage First Garbage Collector log file data. If you’re not familiar with the subject, you might want to review Garbage First Garbage Collector Tuning by Monica Beckwith. The customer had been running Java HotSpot 1.6.0_31 to host a web application server. I was told that the Solaris/SPARC server was running a Java process launched using a commmand line that included the following flags: -d64 -Xms9g -Xmx9g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintFlagsFinal -XX:+DisableExplicitGC -XX:+UnlockExperimentalVMOptions -XX:ParallelGCThreads=8 Several sources on the internet indicate that if I were to print out the 1.5 GB of log files, it would require enough paper to fill the bed of a pick up truck. Of course, it would be fruitless to try to scan the log files by hand. Tools will be required to summarize the contents of the log files. Others have encountered large Java garbage collection log files. There are existing tools to analyze the log files: IBM’s GC toolkit The chewiebug GCViewer gchisto HPjmeter Instead of using one of the other tools listed, I decide to parse the log files with standard Unix tools, and analyze the data with R. Data Cleansing The log files arrived in two different formats. I guess that the difference is that one set of log files was generated using a more verbose option, maybe -XX:+PrintHeapAtGC, and the other set of log files was generated without that option. Format 1 In some of the log files, the log files with the less verbose format, a single trace, i.e. the report of a singe garbage collection event, looks like this: {Heap before GC invocations=12280 (full 61): garbage-first heap total 9437184K, used 7499918K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 1 young (4096K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. 2014-05-14T07:24:00.988-0700: 60586.353: [GC pause (young) 7324M->7320M(9216M), 0.1567265 secs] Heap after GC invocations=12281 (full 61): garbage-first heap total 9437184K, used 7496533K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 0 young (0K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. } A simple grep can be used to extract a summary: $ grep "\[ GC pause (young" g1gc.log 2014-05-13T13:24:35.091-0700: 3.109: [GC pause (young) 20M->5029K(9216M), 0.0146328 secs] 2014-05-13T13:24:35.440-0700: 3.459: [GC pause (young) 9125K->6077K(9216M), 0.0086723 secs] 2014-05-13T13:24:37.581-0700: 5.599: [GC pause (young) 25M->8470K(9216M), 0.0203820 secs] 2014-05-13T13:24:42.686-0700: 10.704: [GC pause (young) 44M->15M(9216M), 0.0288848 secs] 2014-05-13T13:24:48.941-0700: 16.958: [GC pause (young) 51M->20M(9216M), 0.0491244 secs] 2014-05-13T13:24:56.049-0700: 24.066: [GC pause (young) 92M->26M(9216M), 0.0525368 secs] 2014-05-13T13:25:34.368-0700: 62.383: [GC pause (young) 602M->68M(9216M), 0.1721173 secs] But that format wasn't easily read into R, so I needed to be a bit more tricky. I used the following Unix command to create a summary file that was easy for R to read. $ echo "SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime" $ grep "\[GC pause (young" g1gc.log | grep -v mark | sed -e 's/[A-SU-z,]/ /g' -e 's/->/ /' -e 's/: / /g' | more SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime 2014-05-13T13:24:35.091-0700 3.109 20 5029 9216 0.0146328 2014-05-13T13:24:35.440-0700 3.459 9125 6077 9216 0.0086723 2014-05-13T13:24:37.581-0700 5.599 25 8470 9216 0.0203820 2014-05-13T13:24:42.686-0700 10.704 44 15 9216 0.0288848 2014-05-13T13:24:48.941-0700 16.958 51 20 9216 0.0491244 2014-05-13T13:24:56.049-0700 24.066 92 26 9216 0.0525368 2014-05-13T13:25:34.368-0700 62.383 602 68 9216 0.1721173 Format 2 In some of the log files, the log files with the more verbose format, a single trace, i.e. the report of a singe garbage collection event, was more complicated than Format 1. Here is a text file with an example of a single G1GC trace in the second format. As you can see, it is quite complicated. It is nice that there is so much information available, but the level of detail can be overwhelming. I wrote this awk script (download) to summarize each trace on a single line. #!/usr/bin/env awk -f BEGIN { printf("SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize\n") } ###################### # Save count data from lines that are at the start of each G1GC trace. # Each trace starts out like this: # {Heap before GC invocations=14 (full 0): # garbage-first heap total 9437184K, used 325496K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) ###################### /{Heap.*full/{ gsub ( "\\)" , "" ); nf=split($0,a,"="); split(a[2],b," "); getline; if ( match($0, "first") ) { G1GC=1; IncrementalCount=b[1]; FullCount=substr( b[3], 1, length(b[3])-1 ); } else { G1GC=0; } } ###################### # Pull out time stamps that are in lines with this format: # 2014-05-12T14:02:06.025-0700: 94.312: [GC pause (young), 0.08870154 secs] ###################### /GC pause/ { DateTime=$1; SecondsSinceLaunch=substr($2, 1, length($2)-1); } ###################### # Heap sizes are in lines that look like this: # [ 4842M->4838M(9216M)] ###################### /\[ .*]$/ { gsub ( "\\[" , "" ); gsub ( "\ \]" , "" ); gsub ( "->" , " " ); gsub ( "\$ " , " " ); gsub ( "\ $" , " " ); split($0,a," "); if ( split(a[1],b,"M") > 1 ) {BeforeSize=b[1]*1024;} if ( split(a[1],b,"K") > 1 ) {BeforeSize=b[1];} if ( split(a[2],b,"M") > 1 ) {AfterSize=b[1]*1024;} if ( split(a[2],b,"K") > 1 ) {AfterSize=b[1];} if ( split(a[3],b,"M") > 1 ) {TotalSize=b[1]*1024;} if ( split(a[3],b,"K") > 1 ) {TotalSize=b[1];} } ###################### # Emit an output line when you find input that looks like this: # [Times: user=1.41 sys=0.08, real=0.24 secs] ###################### /\[Times/ { if (G1GC==1) { gsub ( "," , "" ); split($2,a,"="); UserTime=a[2]; split($3,a,"="); SysTime=a[2]; split($4,a,"="); RealTime=a[2]; print DateTime,SecondsSinceLaunch,IncrementalCount,FullCount,UserTime,SysTime,RealTime,BeforeSize,AfterSize,TotalSize; G1GC=0; } } The resulting summary is about 25X smaller that the original file, but still difficult for a human to digest. SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ... 2014-05-12T18:36:34.669-0700: 3985.744 561 0 0.57 0.06 0.16 1724416 1720320 9437184 2014-05-12T18:36:34.839-0700: 3985.914 562 0 0.51 0.06 0.19 1724416 1720320 9437184 2014-05-12T18:36:35.069-0700: 3986.144 563 0 0.60 0.04 0.27 1724416 1721344 9437184 2014-05-12T18:36:35.354-0700: 3986.429 564 0 0.33 0.04 0.09 1725440 1722368 9437184 2014-05-12T18:36:35.545-0700: 3986.620 565 0 0.58 0.04 0.17 1726464 1722368 9437184 2014-05-12T18:36:35.726-0700: 3986.801 566 0 0.43 0.05 0.12 1726464 1722368 9437184 2014-05-12T18:36:35.856-0700: 3986.930 567 0 0.30 0.04 0.07 1726464 1723392 9437184 2014-05-12T18:36:35.947-0700: 3987.023 568 0 0.61 0.04 0.26 1727488 1723392 9437184 2014-05-12T18:36:36.228-0700: 3987.302 569 0 0.46 0.04 0.16 1731584 1724416 9437184 Reading the Data into R Once the GC log data had been cleansed, either by processing the first format with the shell script, or by processing the second format with the awk script, it was easy to read the data into R. g1gc.df = read.csv("summary.txt", row.names = NULL, stringsAsFactors=FALSE,sep="") str(g1gc.df) ## 'data.frame': 8307 obs. of 10 variables: ## $ row.names : chr "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ... ## $ SecondsSinceLaunch: num 1.16 1.47 1.97 3.83 6.1 ... ## $ IncrementalCount : int 0 1 2 3 4 5 6 7 8 9 ... ## $ FullCount : int 0 0 0 0 0 0 0 0 0 0 ... ## $ UserTime : num 0.11 0.05 0.04 0.21 0.08 0.26 0.31 0.33 0.34 0.56 ... ## $ SysTime : num 0.04 0.01 0.01 0.05 0.01 0.06 0.07 0.06 0.07 0.09 ... ## $ RealTime : num 0.02 0.02 0.01 0.04 0.02 0.04 0.05 0.04 0.04 0.06 ... ## $ BeforeSize : int 8192 5496 5768 22528 24576 43008 34816 53248 55296 93184 ... ## $ AfterSize : int 1400 1672 2557 4907 7072 14336 16384 18432 19456 21504 ... ## $ TotalSize : int 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 ... head(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount ## 1 2014-05-12T14:00:32.868-0700: 1.161 0 ## 2 2014-05-12T14:00:33.179-0700: 1.472 1 ## 3 2014-05-12T14:00:33.677-0700: 1.969 2 ## 4 2014-05-12T14:00:35.538-0700: 3.830 3 ## 5 2014-05-12T14:00:37.811-0700: 6.103 4 ## 6 2014-05-12T14:00:41.428-0700: 9.720 5 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 1 0 0.11 0.04 0.02 8192 1400 9437184 ## 2 0 0.05 0.01 0.02 5496 1672 9437184 ## 3 0 0.04 0.01 0.01 5768 2557 9437184 ## 4 0 0.21 0.05 0.04 22528 4907 9437184 ## 5 0 0.08 0.01 0.02 24576 7072 9437184 ## 6 0 0.26 0.06 0.04 43008 14336 9437184 Basic Statistics Once the data has been read into R, simple statistics are very easy to generate. All of the numbers from high school statistics are available via simple commands. For example, generate a summary of every column: summary(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount FullCount ## Length:8307 Min. : 1 Min. : 0 Min. : 0.0 ## Class :character 1st Qu.: 9977 1st Qu.:2048 1st Qu.: 0.0 ## Mode :character Median :12855 Median :4136 Median : 12.0 ## Mean :12527 Mean :4156 Mean : 31.6 ## 3rd Qu.:15758 3rd Qu.:6262 3rd Qu.: 61.0 ## Max. :55484 Max. :8391 Max. :113.0 ## UserTime SysTime RealTime BeforeSize ## Min. :0.040 Min. :0.0000 Min. : 0.0 Min. : 5476 ## 1st Qu.:0.470 1st Qu.:0.0300 1st Qu.: 0.1 1st Qu.:5137920 ## Median :0.620 Median :0.0300 Median : 0.1 Median :6574080 ## Mean :0.751 Mean :0.0355 Mean : 0.3 Mean :5841855 ## 3rd Qu.:0.920 3rd Qu.:0.0400 3rd Qu.: 0.2 3rd Qu.:7084032 ## Max. :3.370 Max. :1.5600 Max. :488.1 Max. :8696832 ## AfterSize TotalSize ## Min. : 1380 Min. :9437184 ## 1st Qu.:5002752 1st Qu.:9437184 ## Median :6559744 Median :9437184 ## Mean :5785454 Mean :9437184 ## 3rd Qu.:7054336 3rd Qu.:9437184 ## Max. :8482816 Max. :9437184 Q: What is the total amount of User CPU time spent in garbage collection? sum(g1gc.df$UserTime) ## [1] 6236 As you can see, less than two hours of CPU time was spent in garbage collection. Is that too much? To find the percentage of time spent in garbage collection, divide the number above by total_elapsed_time*CPU_count. In this case, there are a lot of CPU’s and it turns out the the overall amount of CPU time spent in garbage collection isn’t a problem when viewed in isolation. When calculating rates, i.e. events per unit time, you need to ask yourself if the rate is homogenous across the time period in the log file. Does the log file include spikes of high activity that should be separately analyzed? Averaging in data from nights and weekends with data from business hours may alias problems. If you have a reason to suspect that the garbage collection rates include peaks and valleys that need independent analysis, see the “Time Series” section, below. Q: How much garbage is collected on each pass? The amount of heap space that is recovered per GC pass is surprisingly low: At least one collection didn’t recover any data. (“Min.=0”) 25% of the passes recovered 3MB or less. (“1st Qu.=3072”) Half of the GC passes recovered 4MB or less. (“Median=4096”) The average amount recovered was 56MB. (“Mean=56390”) 75% of the passes recovered 36MB or less. (“3rd Qu.=36860”) At least one pass recovered 2GB. (“Max.=2121000”) g1gc.df$Delta = g1gc.df$BeforeSize - g1gc.df$AfterSize summary(g1gc.df$Delta) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0 3070 4100 56400 36900 2120000 Q: What is the maximum User CPU time for a single collection? The worst garbage collection (“Max.”) is many standard deviations away from the mean. The data appears to be right skewed. summary(g1gc.df$UserTime) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.040 0.470 0.620 0.751 0.920 3.370 sd(g1gc.df$UserTime) ## [1] 0.3966 Basic Graphics Once the data is in R, it is trivial to plot the data with formats including dot plots, line charts, bar charts (simple, stacked, grouped), pie charts, boxplots, scatter plots histograms, and kernel density plots. Histogram of User CPU Time per Collection I don't think that this graph requires any explanation. hist(g1gc.df$UserTime, main="User CPU Time per Collection", xlab="Seconds", ylab="Frequency") Box plot to identify outliers When the initial data is viewed with a box plot, you can see the one crazy outlier in the real time per GC. Save this data point for future analysis and drop the outlier so that it’s not throwing off our statistics. Now the box plot shows many outliers, which will be examined later, using times series analysis. Notice that the scale of the x-axis changes drastically once the crazy outlier is removed. par(mfrow=c(2,1)) boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(dominated by a crazy outlier)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") crazy.outlier.df=g1gc.df[g1gc.df$RealTime > 400,] g1gc.df=g1gc.df[g1gc.df$RealTime < 400,] boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(crazy outlier excluded)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") box(which = "outer", lty = "solid") Here is the crazy outlier for future analysis: crazy.outlier.df ## row.names SecondsSinceLaunch IncrementalCount ## 8233 2014-05-12T23:15:43.903-0700: 20741 8316 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 8233 112 0.55 0.42 488.1 8381440 8235008 9437184 ## Delta ## 8233 146432 R Time Series Data To analyze the garbage collection as a time series, I’ll use Z’s Ordered Observations (zoo). “zoo is the creator for an S3 class of indexed totally ordered observations which includes irregular time series.” require(zoo) ## Loading required package: zoo ## ## Attaching package: 'zoo' ## ## The following objects are masked from 'package:base': ## ## as.Date, as.Date.numeric head(g1gc.df[,1]) ## [1] "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" ## [3] "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ## [5] "2014-05-12T14:00:37.811-0700:" "2014-05-12T14:00:41.428-0700:" options("digits.secs"=3) times=as.POSIXct( g1gc.df[,1], format="%Y-%m-%dT%H:%M:%OS%z:") g1gc.z = zoo(g1gc.df[,-c(1)], order.by=times) head(g1gc.z) ## SecondsSinceLaunch IncrementalCount FullCount ## 2014-05-12 17:00:32.868 1.161 0 0 ## 2014-05-12 17:00:33.178 1.472 1 0 ## 2014-05-12 17:00:33.677 1.969 2 0 ## 2014-05-12 17:00:35.538 3.830 3 0 ## 2014-05-12 17:00:37.811 6.103 4 0 ## 2014-05-12 17:00:41.427 9.720 5 0 ## UserTime SysTime RealTime BeforeSize AfterSize ## 2014-05-12 17:00:32.868 0.11 0.04 0.02 8192 1400 ## 2014-05-12 17:00:33.178 0.05 0.01 0.02 5496 1672 ## 2014-05-12 17:00:33.677 0.04 0.01 0.01 5768 2557 ## 2014-05-12 17:00:35.538 0.21 0.05 0.04 22528 4907 ## 2014-05-12 17:00:37.811 0.08 0.01 0.02 24576 7072 ## 2014-05-12 17:00:41.427 0.26 0.06 0.04 43008 14336 ## TotalSize Delta ## 2014-05-12 17:00:32.868 9437184 6792 ## 2014-05-12 17:00:33.178 9437184 3824 ## 2014-05-12 17:00:33.677 9437184 3211 ## 2014-05-12 17:00:35.538 9437184 17621 ## 2014-05-12 17:00:37.811 9437184 17504 ## 2014-05-12 17:00:41.427 9437184 28672 Example of Two Benchmark Runs in One Log File The data in the following graph is from a different log file, not the one of primary interest to this article. I’m including this image because it is an example of idle periods followed by busy periods. It would be uninteresting to average the rate of garbage collection over the entire log file period. More interesting would be the rate of garbage collect in the two busy periods. Are they the same or different? Your production data may be similar, for example, bursts when employees return from lunch and idle times on weekend evenings, etc. Once the data is in an R Time Series, you can analyze isolated time windows. Clipping the Time Series data Flashing back to our test case… Viewing the data as a time series is interesting. You can see that the work intensive time period is between 9:00 PM and 3:00 AM. Lets clip the data to the interesting period: par(mfrow=c(2,1)) plot(g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Complete Log File", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") clipped.g1gc.z=window(g1gc.z, start=as.POSIXct("2014-05-12 21:00:00"), end=as.POSIXct("2014-05-13 03:00:00")) plot(clipped.g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Limited to Benchmark Execution", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") box(which = "outer", lty = "solid") Cumulative Incremental and Full GC count Here is the cumulative incremental and full GC count. When the line is very steep, it indicates that the GCs are repeating very quickly. Notice that the scale on the Y axis is different for full vs. incremental. plot(clipped.g1gc.z[,c(2:3)], main="Cumulative Incremental and Full GC count", xlab="Time of Day", col="#1b9e77") GC Analysis of Benchmark Execution using Time Series data In the following series of 3 graphs: The “After Size” show the amount of heap space in use after each garbage collection. Many Java objects are still referenced, i.e. alive, during each garbage collection. This may indicate that the application has a memory leak, or may indicate that the application has a very large memory footprint. Typically, an application's memory footprint plateau's in the early stage of execution. One would expect this graph to have a flat top. The steep decline in the heap space may indicate that the application crashed after 2:00. The second graph shows that the outliers in real execution time, discussed above, occur near 2:00. when the Java heap seems to be quite full. The third graph shows that Full GCs are infrequent during the first few hours of execution. The rate of Full GC's, (the slope of the cummulative Full GC line), changes near midnight. plot(clipped.g1gc.z[,c("AfterSize","RealTime","FullCount")], xlab="Time of Day", col=c("#1b9e77","red","#1b9e77")) GC Analysis of heap recovered Each GC trace includes the amount of heap space in use before and after the individual GC event. During garbage coolection, unreferenced objects are identified, the space holding the unreferenced objects is freed, and thus, the difference in before and after usage indicates how much space has been freed. The following box plot and bar chart both demonstrate the same point - the amount of heap space freed per garbage colloection is surprisingly low. par(mfrow=c(2,1)) boxplot(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", horizontal = TRUE, col="red") hist(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", breaks=100, col="red") box(which = "outer", lty = "solid") This graph is the most interesting. The dark blue area shows how much heap is occupied by referenced Java objects. This represents memory that holds live data. The red fringe at the top shows how much data was recovered after each garbage collection. barplot(clipped.g1gc.z[,c("AfterSize","Delta")], col=c("#7570b3","#e7298a"), xlab="Time of Day", border=NA) legend("topleft", c("Live Objects","Heap Recovered on GC"), fill=c("#7570b3","#e7298a")) box(which = "outer", lty = "solid") When I discuss the data in the log files with the customer, I will ask for an explaination for the large amount of referenced data resident in the Java heap. There are two are posibilities: There is a memory leak and the amount of space required to hold referenced objects will continue to grow, limited only by the maximum heap size. After the maximum heap size is reached, the JVM will throw an “Out of Memory” exception every time that the application tries to allocate a new object. If this is the case, the aplication needs to be debugged to identify why old objects are referenced when they are no longer needed. The application has a legitimate requirement to keep a large amount of data in memory. The customer may want to further increase the maximum heap size. Another possible solution would be to partition the application across multiple cluster nodes, where each node has responsibility for managing a unique subset of the data. Conclusion In conclusion, R is a very powerful tool for the analysis of Java garbage collection log files. The primary difficulty is data cleansing so that information can be read into an R data frame. Once the data has been read into R, a rich set of tools may be used for thorough evaluation.

Read the article

Search Results

Search found 6515 results on 261 pages for 'half life'.

Page 259/261 | < Previous Page | 255 256 257 258 259 260 261 | Next Page >

- by melee

- by PaulH

- by Ben

- by user559601

- by darkrain

- by Ophileon

- by Midimistro

- by user1865053

- by WulfgarPro

- by ??iu

- by pts

- by Rick Koshi

- by Martijn Heemels

- by ??iu

- by faye.todd(at)oracle.com

- by danx

- by Simon Thorpe

- by Stefan Hinker

- by Rick Strahl

- by user647124

- by Rick Strahl

- by dwahlin

- by Alan Smith

- by Michael B. McLaughlin

- by user12620111

< Previous Page | 255 256 257 258 259 260 261 | Next Page >