OS Analytics - Deep Dive Into Your OS
- by Eran_Steiner
Enterprise Manager Ops Center provides a feature called "OS Analytics". This feature allows you to get a better understanding of how the Operating System is being utilized. You can research the historical usage as well as real time data. This post will show how you can benefit from OS Analytics and how it works behind the scenes. 
     
       
         
           We will have a call to discuss this blog - please join us!Date: Thursday, November 1, 2012Time: 11:00 am, Eastern Daylight Time (New York, GMT-04:00)1. Go to https://oracleconferencing.webex.com/oracleconferencing/j.php?ED=209833067&UID=1512092402&PW=NY2JhMmFjMmFh&RT=MiMxMQ%3D%3D2. If requested, enter your name and email address.3. If a password is required, enter the meeting password: oracle1234. Click "Join". To join the teleconference:Call-in toll-free number:       1-866-682-4770  (US/Canada)      Other countries:                https://oracle.intercallonline.com/portlets/scheduling/viewNumbers/viewNumber.do?ownerNumber=5931260&audioType=RP&viewGa=true&ga=ONConference Code:       7629343#Security code:            7777# 
         
       
      
    
    
  Here is quick summary of what you can do with OS Analytics in Ops Center: 
   
    View historical charts and real time value of CPU, memory, network and disk utilization 
    Find the top CPU and Memory processes in real time or at a certain historical day 
    Determine proper monitoring thresholds based on historical data 
    View Solaris services status details 
    Drill down into a process details 
    View the busiest zones if applicable 
   
  Where to start 
  To start with OS Analytics, choose the OS asset in the tree and click the Analytics tab. 
  You can see the CPU utilization, Memory utilization and Network utilization, along with the current real time top 5 processes in each category (click the image to see a larger version): 
   
   In the above screen, you can click each of the top 5 processes to see a more detailed view of that process. Here is an example of one of the processes: 
   
  One of the cool things is that you can see the process tree for this process along with some port binding and open file descriptors. 
     
    
  On Solaris machines with zones, you get an extra level of tabs, allowing you to get more information on the different zones: 
   
    
    
    
    
  This is a good way to see the busiest zones. For example, one zone may not take a lot of CPU but it can consume a lot of memory, or perhaps network bandwidth. To see the detailed Analytics for each of the zones, simply click each of the zones in the tree and go to its Analytics tab. 
   
    
  Next, click the "Processes" tab to see real time information of all the processes on the machine: 
   
    
    
   An interesting column is the "Target" column. If you configured Ops Center to work with Enterprise Manager Cloud Control, then the two products will talk to each other and Ops Center will display the correlated target from Cloud Control in this table. If you are only using Ops Center - this column will remain empty. 
     
    
  Next, if you view a Solaris machine, you will have a "Services" tab: 
    
    
   By default, all services will be displayed, but you can choose to display only certain states, for example, those in maintenance or the degraded ones. You can highlight a service and choose to view the details, where you can see the Dependencies, Dependents and also the location of the service log file (not shown in the picture as you need to scroll down to see the log file). 
   
    
    
    
  The "Threshold" tab is particularly helpful - you can view historical trends of different monitored values and based on the graph - determine what the monitoring values should be: 
    
  You can ask Ops Center to suggest monitoring levels based on the historical values or you can set your own. The different colors in the graph represent the current set levels: Red for critical, Yellow for warning and Blue for Information, allowing you to quickly see how they're positioned against real data. 
  It's important to note that when looking at longer periods, Ops Center smooths out the data and uses averages. So when looking at values such as CPU Usage, try shorter time frames which are more detailed, such as one hour or one day.  
  Applying new monitoring values 
    
    
  When first applying new values to monitored attributes - a popup will come up asking if it's OK to get you out of the current Monitoring Policy. This is OK if you want to either have custom monitoring for a specific machine, or if you want to use this current machine as a "Gold image" and extract a Monitoring Policy from it. You can later apply the new Monitoring Policy to other machines and also set it as a default Monitoring Profile.  
  Once you're done with applying the different monitoring values, you can review and change them in the "Monitoring" tab. You can also click the "Extract a Monitoring Policy" in the actions pane on the right to save all the new values to a new Monitoring Policy, which can then be found under "Plan Management" -> "Monitoring Policies".  
  Visiting the past 
  Under the "History" tab you can "go back in time". This is very helpful when you know that a machine was busy a few hours ago (perhaps in the middle of the night?), but you were not around to take a look at it in real time. Here's a view into yesterday's data on one of the machines: 
   
  You can see an interesting CPU spike happening at around 3:30 am along with some memory use. In the bottom table you can see the top 5 CPU and Memory consumers at the requested time. Very quickly you can see that this spike is related to the Solaris 11 IPS repository synchronization process using the "pkgrecv" command. 
    
  The "time machine" doesn't stop here - you can also view historical data to determine which of the zones was the busiest at a given time: 
   
     
  Under the hood 
  The data collected is stored on each of the agents under /var/opt/sun/xvm/analytics/historical/ 
   
    An "os.zip" file exists for the main OS. Inside you will find many small text files, named after the Epoch time stamp in which they were taken 
    If you have any zones, there will be a file called "guests.zip" containing the same small files for all the zones, as well as a folder with the name of the zone along with "os.zip" in it 
    If this is the Enterprise Controller or the Proxy Controller, you will have folders called "proxy" and "sat" in which you will find the "os.zip" for that controller 
    
  The actual script collecting the data can be viewed for debugging purposes as well: 
   
    On Linux, the location is: /opt/sun/xvmoc/private/os_analytics/collect 
    On Solaris, the location is /opt/SUNWxvmoc/private/os_analytics/collect  
   
  If you would like to redirect all the standard error into a file for debugging, touch the following file and the output will go into it: 
  # touch /tmp/.collect.stderr    
  The temporary data is collected under /var/opt/sun/xvm/analytics/.collectdb until it is zipped. 
  If you would like to review the properties for the Analytics, you can view those per each agent in /opt/sun/n1gc/lib/XVM.properties. Find the section "Analytics configurable properties for OS and VSC" to view the Analytics specific values.  
    
  I hope you find this helpful! Please post questions in the comments below. 
   Eran Steiner