Search Results

Search found 63182 results on 2528 pages for 'data driven tests'.

Page 448/2528 | < Previous Page | 444 445 446 447 448 449 450 451 452 453 454 455 | Next Page >

Why should I prepend my custom attributes with "data-"?

- by Horace Loeb

So any custom data attribute that I use should start with "data-": <li class="user" data-name="John Resig" data-city="Boston" data-lang="js" data-food="Bacon"> John says: Hello, how are you? </li> Will anything bad happen if I just ignore this? I.e.: <li class="user" name="John Resig" city="Boston" lang="js" food="Bacon"> John says: Hello, how are you? </li> I guess one bad thing is that my custom attributes could conflict with HTML attributes with special meanings (e.g., name), but aside from this, is there a problem with just writing "example_text" instead of "data-example_text"? (It won't validate, but who cares?)

Read the article
How can I allow users to switch data sources for an SSRS report?

- by fatcat1111

I have two SQL Server databases with identical schemas, but different data. I also have SSRS generating reports, in native mode, for one of the databases. All reports the same shared data source. I would like to allow users to get reports for the other database. I created a second shared data source for the second database. Modifying the reports to use this second data source results in reports as expected. Because the RDLs are the same, except for the data source, and because I don't want to maintain what are basically duplicate reports, I'm looking for a way to dynamically switch data sources, depending on user input. Is there an easy means of accomplishing this? An existing solution would be best. Barring that, can the RDL's data source be parametrized? Or, can the RDS's connection string be parametrized?

Read the article
Need a way for users to enter data while offline and re-submit it when back online

- by crankharder

As part of a larger webapp, I want to build functionality that allows a user to enter data while offline -- and then send that data back to my site when they have a connection again The parts that, to me, are missing ar Saving a certain set of data in their browser Saving a form that allows them to enter data using form from step#2 to update data from step#1 getting data out of the local data store and sending it back to the server I would like to keep this entirely within the browser, so... Does HTML5 meet some (or all) of those goals as it's currently implemented in webkit/ff3? If not,what technologies should I start looking into in order to accomplish all of the above.

Read the article
ADF Business Components

- by Arda Eralp

ADF Business Components and JDeveloper simplify the development, delivery, and customization of business applications for the Java EE platform. With ADF Business Components, developers aren't required to write the application infrastructure code required by the typical Java EE application to: Connect to the database Retrieve data Lock database records Manage transactions ADF Business Components addresses these tasks through its library of reusable software components and through the supporting design time facilities in JDeveloper. Most importantly, developers save time using ADF Business Components since the JDeveloper design time makes typical development tasks entirely declarative. In particular, JDeveloper supports declarative development with ADF Business Components to: Author and test business logic in components which automatically integrate with databases Reuse business logic through multiple SQL-based views of data, supporting different application tasks Access and update the views from browser, desktop, mobile, and web service clients Customize application functionality in layers without requiring modification of the delivered application The goal of ADF Business Components is to make the business services developer more productive. ADF Business Components provides a foundation of Java classes that allow your business-tier application components to leverage the functionality provided in the following areas: Simplifying Data Access Design a data model for client displays, including only necessary data Include master-detail hierarchies of any complexity as part of the data model Implement end-user Query-by-Example data filtering without code Automatically coordinate data model changes with business services layer Automatically validate and save any changes to the database Enforcing Business Domain Validation and Business Logic Declaratively enforce required fields, primary key uniqueness, data precision-scale, and foreign key references Easily capture and enforce both simple and complex business rules, programmatically or declaratively, with multilevel validation support Navigate relationships between business domain objects and enforce constraints related to compound components Supporting Sophisticated UIs with Multipage Units of Work Automatically reflect changes made by business service application logic in the user interface Retrieve reference information from related tables, and automatically maintain the information when the user changes foreign-key values Simplify multistep web-based business transactions with automatic web-tier state management Handle images, video, sound, and documents without having to use code Synchronize pending data changes across multiple views of data Consistently apply prompts, tooltips, format masks, and error messages in any application Define custom metadata for any business components to support metadata-driven user interface or application functionality Add dynamic attributes at runtime to simplify per-row state management Implementing High-Performance Service-Oriented Architecture Support highly functional web service interfaces for business integration without writing code Enforce best-practice interface-based programming style Simplify application security with automatic JAAS integration and audit maintenance "Write once, run anywhere": use the same business service as plain Java class, EJB session bean, or web service Streamlining Application Customization Extend component functionality after delivery without modifying source code Globally substitute delivered components with extended ones without modifying the application ADF Business Components implements the business service through the following set of cooperating components: Entity object An entity object represents a row in a database table and simplifies modifying its data by handling all data manipulation language (DML) operations for you. These are basically your 1 to 1 representation of a database table. Each table in the database will have 1 and only 1 EO. The EO contains the mapping between columns and attributes. EO's also contain the business logic and validation. These are you core data services. They are responsible for updating, inserting and deleting records. The Attributes tab displays the actual mapping between attributes and columns, the mapping has following fields: Name : contains the name of the attribute we expose in our data model. Type : defines the data type of the attribute in our application. Column : specifies the column to which we want to map the attribute with Column Type : contains the type of the column in the database View object A view object represents a SQL query. You use the full power of the familiar SQL language to join, filter, sort, and aggregate data into exactly the shape required by the end-user task. The attributes in the View Objects are actually coming from the Entity Object. In the end the VO will generate a query but you basically build a VO by selecting which EO need to participate in the VO and which attributes of those EO you want to use. That's why you have the Entity Usage column so you can see the relation between VO and EO. In the query tab you can clearly see the query that will be generated for the VO. At this stage we don't need it and just use it for information purpose. In later stages we might use it. Application module An application module is the controller of your data layer. It is responsible for keeping hold of the transaction. It exposes the data model to the view layer. You expose the VO's through the Application Module. This is the abstraction of your data layer which you want to show to the outside word.It defines an updatable data model and top-level procedures and functions (called service methods) related to a logical unit of work related to an end-user task. While the base components handle all the common cases through built-in behavior, customization is always possible and the default behavior provided by the base components can be easily overridden or augmented. When you create EO's, a foreign key will be translated into an association in our model. It defines the type of relation and who is the master and child as well as how the visibility of the association looks like. A similar concept exists to identify relations between view objects. These are called view links. These are almost identical as association except that a view link is based upon attributes defined in the view object. It can also be based upon an association. Here's a short summary: Entity Objects: representations of tables Association: Relations between EO's. Representations of foreign keys View Objects: Logical model View Links: Relationships between view objects Application Model: interface to your application

Read the article
Why can't I see any data in the Google App Engine *Development* Console?

- by willem

I run my google app engine application in one of two ways... Directly by using the application from http://localhost:8080 Or execute unit tests from http://localhost:8080/test When I create entities by using the application directly, the data is visible in the Development Console (dataStore view). However, when I execute the unit tests... even if they succeed and I can put() and get() data, the data does not show in the dataStore view. Any idea why I can't see my data? Even though it is there? Notes: I use GAEUnit for unit tests. the data stored mostly consists of StringProperties(). I use Python and run Django on top of the GAE, don't know if that matters.

Read the article
Socket read() hangs for a while when there is no data to read.

- by janesconference

Hi' I'm writing a simple http port forwarder. I read data from port 80, and pass the data to my lighttpd server, on port 8080. As long as I write() data on the socket on port 8080 (forwarding the request) there's no problem, but when I read() data from that socket (forwarding the response), the last read() hangs a lot (about 1 or 2 seconds) before realizing there's no more data and returning 0. I tried to set the socket to non-blocking, but this doesn't work, as sometimes it returns EWOULDBLOCKING even if there's some data left (lighttpd + cgi can be quite slow). I tried to set a timeout with select(), but, as above, a slow cgi could timeout the socket when there's actually some data to transmit. How would you do?

Read the article
How can i supply an AntiForgeryToken when posting JSON data using $.ajax ?

- by HerbalMart

I am using the code as below of this post: First i will an fill array variable with the correct values for the controller action. Using the code below i think it should be very straigtforward by just adding the following line to the javascript: data["__RequestVerificationToken"] = $('[name=__RequestVerificationToken]').val(); The <%= Html.AntiForgeryToken() %> is at his right place and the action has a [ValidateAntiForgeryToken] But my controller action keeps saying: "Invalid forgery token" What am i doing wrong here? Code data["fiscalyear"] = fiscalyear; data["subgeography"] = $(list).parent().find('input[name=subGeography]').val(); data["territories"] = new Array(); $(items).each(function() { data["territories"].push($(this).find('input[name=territory]').val()); }); if (url != null) { $.ajax( { dataType: 'JSON', contentType: 'application/json; charset=utf-8', url: url, type: 'POST', context: document.body, data: JSON.stringify(data), success: function() { refresh(); } }); }

Read the article
Best practice -- Content Tracking Remote Data (cURL, file_get_contents, cron, et. al)?

- by user322787

I am attempting to build a script that will log data that changes every 1 second. The initial thought was "Just run a php file that does a cURL every second from cron" -- but I have a very strong feeling that this isn't the right way to go about it. Here are my specifications: There are currently 10 sites I need to gather data from and log to a database -- this number will invariably increase over time, so the solution needs to be scalable. Each site has data that it spits out to a URL every second, but only keeps 10 lines on the page, and they can sometimes spit out up to 10 lines each time, so I need to pick up that data every second to ensure I get all the data. As I will also be writing this data to my own DB, there's going to be I/O every second of every day for a considerably long time. Barring magic, what is the most efficient way to achieve this? it might help to know that the data that I am getting every second is very small, under 500bytes.

Read the article
how read data from file and execute to MYSQL?

- by Mahran Elneel

i create form to load sql file and used fopen function top open file and read this but when want to execute this data to database not work? what is wrong in my code????????????? $ofile = trim(basename($_FILES['sqlfile']['name'])); $path = "sqlfiles/".$ofile; //$data = settype($data,"string"); $file = ""; $connect = mysql_connect('localhost','root',''); $selectdb = mysql_select_db('files'); if(isset($_POST['submit'])) { if(!move_uploaded_file($_FILES['sqlfile']['tmp_name'],"sqlfiles/".$ofile)) { $path = ""; } $file = fopen("sqlfiles/".$ofile,"r") or exit("error open file!"); while (!feof($file)) { $data = fgetc($file); settype($data,"string"); $rslt = mysql_query($data); print $data; } fclose($file); }

Read the article
How can I access the row index numbers on a data frame in R?

- by user123276

I have a data frame that was sampled from another data frame. As a result, when I print the output of the data frame, I get a jumble of numbers on the left hand side of the data frame. The original data frame was nicely numbered from 1,2,3,4,5, and so on. But my new data frame is numbered 5,15,3,65, etc on the left hand side. Is there a way I can access the row index information for a data frame in R? thank you!

Read the article
[Python] Best strategy for dealing with incomplete lines of data from a file.

- by adoran

I use the following block of code to read lines out of a file 'f' into a nested list: for data in f: clean_data = data.rstrip() data = clean_data.split('\t') t += [data[0]] strmat += [data[1:]] Sometimes, however, the data is incomplete and a row may look like this: ['955.159', '62.8168', '', '', '', '', '', '', '', '', '', '', '', '', '', '29', '30', '0', '0'] It puts a spanner in the works because I would like Python to implicitly cast my list as floats but the empty fields '' cause it to be cast as an array of strings (dtype: s12). I could start a second 'if' statement and convert all empty fields into NULL (since 0 is wrong in this instance) but I was unsure whether this was best. Is this the best strategy of dealing with incomplete data? Should I edit the stream or do it post-hoc?

Read the article
why does the data property in an jquery ajax call override my return false?

- by user315709

hi, i have the following block of code: $("#contact_container form, #contact_details form").live( "submit", function(event) { $.ajax({ type: this.method, url: this.action, data: this.serialize(), success: function(data) { data = $(data).find("#content"); $("#contact_details").html(data); }, }); return false; } ; when i leave out the data: this.serialize(), it behaves properly and displays the response within the #contact_details div. however, when i leave it in, it submits the form, causing the page to navigate away. why does the presence of the data attribute negates the return false? (probably due to a bug that i can't spot...) also, is the syntax to my find statement correct? it comes back as "undefined" even though i use a debugger to check the ajax response and that id does exists. thanks, steve

Read the article
Unable to Mange DNS via MMC

- by IT Helpdesk Team Manager

When trying to access the DNS service on Microsoft Windows Server 2003 (Build 3790) domain controller/schema master via the MMC DNS snap in or locally via the DNS MMC from Administrative tools I'm getting a red "X" through the icon for the DNS Server. The inability to access DNS management via MMC happens on all domain controllers as well. We've looked at items such as the DHCP client not being started, incorrect DNS setup ( the machine points at itself and another DC ), the DNS service not running ( it is and all DNS queries via NSLOOKUP work correctly ), dslint returns the correct information and functions as expected. There is the following entry in the DNS event log: The DNS server could not initialize the remote procedure call (RPC) service. If it is not running, start the RPC service or reboot the computer. The event data is the error code. For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp. 0000: 0000051b dnscmd fails with RPC server unavailable yet RPC is started: C:\Documents and Settings\Administrator.DOMAIN>dnscmd /Info Info query failed status = 1722 (0x000006ba) Command failed: RPC_S_SERVER_UNAVAILABLE 1722 (000006ba) DCDIAG /TEST:DNS /V /E produces the following errors: Warning: no DNS RPC connectivity (error or non Microsoft DNS server is running) [Error details: 1753 (Type: Win32 - Description: There are no more endpoints available from the endpoint mapper.)] Warning: no DNS RPC connectivity (error or non Microsoft DNS server is running) [Error details: 1722 (Type: Win32 - Description: The RPC server is unavailable.)] The DNS server could not initialize the remote procedure call (RPC) service. If it is not running, start the RPC service or reboot the computer. The event data is the error code. A DNS query for _ldap._tcp.dc._msdcs. returns the correct results. All domain and ADS related activities are working except that I can't manage my DNS via MMC or dnscmd. Any thoughts or solutions would be greatly appreciated. EDIT: Adding Registry export per request: Key Name: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Rpc Class Name: <NO CLASS> Last Write Time: 10/18/2012 - 2:29 PM Value 0 Name: DCOM Protocols Type: REG_MULTI_SZ Data: ncacn_ip_tcp Value 1 Name: UuidSequenceNumber Type: REG_DWORD Data: 0xb19bd0f Key Name: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Rpc\ClientProtocols Class Name: <NO CLASS> Last Write Time: 3/9/2007 - 12:11 PM Value 0 Name: ncacn_np Type: REG_SZ Data: rpcrt4.dll Value 1 Name: ncacn_ip_tcp Type: REG_SZ Data: rpcrt4.dll Value 2 Name: ncadg_ip_udp Type: REG_SZ Data: rpcrt4.dll Value 3 Name: ncacn_http Type: REG_SZ Data: rpcrt4.dll Value 4 Name: ncacn_at_dsp Type: REG_SZ Data: rpcrt4.dll Key Name: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Rpc\NameService Class Name: <NO CLASS> Last Write Time: 2/20/2006 - 4:48 PM Value 0 Name: DefaultSyntax Type: REG_SZ Data: 3 Value 1 Name: Endpoint Type: REG_SZ Data: \pipe\locator Value 2 Name: NetworkAddress Type: REG_SZ Data: \\. Value 3 Name: Protocol Type: REG_SZ Data: ncacn_np Value 4 Name: ServerNetworkAddress Type: REG_SZ Data: \\. Key Name: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Rpc\NetBios Class Name: <NO CLASS> Last Write Time: 2/20/2006 - 4:48 PM Key Name: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Rpc\RpcProxy Class Name: <NO CLASS> Last Write Time: 3/9/2007 - 12:11 PM Value 0 Name: Enabled Type: REG_DWORD Data: 0x1 Value 1 Name: ValidPorts Type: REG_SZ Data: pdc:100-5000 Key Name: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Rpc\SecurityService Class Name: <NO CLASS> Last Write Time: 2/20/2006 - 4:48 PM Value 0 Name: 9 Type: REG_SZ Data: secur32.dll Value 1 Name: 10 Type: REG_SZ Data: secur32.dll Value 2 Name: 14 Type: REG_SZ Data: schannel.dll Value 3 Name: 16 Type: REG_SZ Data: secur32.dll Value 4 Name: 1 Type: REG_SZ Data: secur32.dll Value 5 Name: 18 Type: REG_SZ Data: secur32.dll Value 6 Name: 68 Type: REG_SZ Data: netlogon.dll

Read the article
Split large repo into multiple subrepos and preserve history (Mercurial)

- by Andrew

We have a large base of code that contains several shared projects, solution files, etc in one directory in SVN. We're migrating to Mercurial. I would like to take this opportunity to reorganize our code into several repositories to make cloning for branching have less overhead. I've already successfully converted our repo from SVN to Mercurial while preserving history. My question: how do I break all the different projects into separate repositories while preserving their history? Here is an example of what our single repository (OurPlatform) currently looks like: /OurPlatform ---- Core ---- Core.Tests ---- Database ---- Database.Tests ---- CMS ---- CMS.Tests ---- Product1.Domain ---- Product1.Stresstester ---- Product1.Web ---- Product1.Web.Tests ---- Product2.Domain ---- Product2.Stresstester ---- Product2.Web ---- Product2.Web.Tests ==== Product1.sln ==== Product2.sln All of those are folders containing VS Projects except for the solution files. Product1.sln and Product2.sln both reference all of the other projects. Ideally, I'd like to take each of those folders, and turn them into separate Hg repos, and also add new repos for each project (they would act as parent repos). Then, If someone was going to work on Product1, they would clone the Product1 repo, which contained Product1.sln and subrepo references to ReferenceAssemblies, Core, Core.Tests, Database, Database.Tests, CMS, and CMS.Tests. So, it's easy to do this by just hg init'ing in the project directories. But can it be done while preserving history? Or is there a better way to arrange this?

Read the article
Generic unit test scheduling

- by Raphink

Hello, I'm (re)writing a program that does generic unit test scheduling. The current program is a mono-threaded Perl program, but I'm willing to modularize it and parallelize the tests. I'm also considering rewriting it in Python. Here is what I need to do: I have a list of tests, with the following attributes: uri: a URI to test (could be HTTP/HTTPS/SSH/local) ; depends: an associative array of tests/values that this test depends on ; join: a list of DB joints to be added when selecting items to process in this test ; depends_db: additional conditions to add to the DB request when selecting items to process in this test. The program builds a dependency tree, beginning with the tests that have no dependencies ; for each test: a list of items is selected from the database using the conditions (results of depending tests, joints and depends_db) ; the list of items is sent to the URI (using POST or stdin) ; the result is retrived as a YAML file listing the state and comments for the test for each tested item ; the results are stored in the DB ; the test returns, allowing depending tests to be performed. the program generates reports (CSV, DB, graphviz) of the performed tests. The primary use of this program currently is to test a fleet of machines against services such as backup, DNS, etc. The tests can then be: - backup: hosted on the backup machine(s), called through HTTP, checks if the machines' backup went well ; - DNS: hosted on the local machine, called via stdin, checks if the machines' fqdn have a valid DNS entry. Does such a tool/module already exist? What would be the best implementation to achieve this (using Perl or Python)?

Read the article
missing rake tasks ??

- by richard moss

hi I have ran gem install rails and am running 2.3.4 but i am missing some rake tasks like 'db' and 'gems' if i run rake -T i get the following tasks. How can i get all the others ? rake apache2 # Build Apache 2 module rake clean # Remove compiled files rake clobber # Remove all generated files rake default # Build everything rake doc # Generate all documentation rake doxygen # Generate Doxygen C++ API documentation if ... rake doxygen:clobber # Remove generated Doxygen C++ API documenta... rake doxygen:force # Force generation of Doxygen C++ API docume... rake fakeroot # Create a fakeroot, useful for building nat... rake nginx # Build Nginx helper server rake package # Build all the packages rake package:clean # Remove package products rake package:debian # Create a Debian package rake package:force # Force a rebuild of the package files rake package:gem # Build the gem file passenger-2.2.4.gem rake rdoc # Build the rdoc HTML Files rake rdoc:clobber # Remove rdoc products rake rdoc:force # Force a rebuild of the RDOC files rake sloccount # Run 'sloccount' to see how much code Passe... rake test # Run all unit tests and integration tests rake test:cxx # Run unit tests for the Apache 2 and Nginx ... rake test:integration # Run all integration tests rake test:integration:apache2 # Run Apache 2 integration tests rake test:integration:nginx # Run Nginx integration tests rake test:oxt # Run unit tests for the OXT library rake test:rcov # Run coverage tests for the Ruby libraries rake test:restart # Run the 'restart' integration test infinit... rake test:ruby If anyone knows why this has happened, how i can fix it or anything else that could help, please let me know thanks alot rick

Read the article
Nose2 multiprocess error on Windows7

- by tt293

I was looking into nose2 as a way to get around the restrictions of having both xunit output and multiprocessing in nose1.3. However, when always-on is set to False in the [multiprocess] section, I can only get a single process running, while when running with always-on set to True, I get the following error: ---------------------------------------------------------------------- Ran 0 tests in 0.043s OK Traceback (most recent call last): File "C:\dev\testing\Tests\PythonTests\venv\Scripts\nose2-script.py", line 8, in <module> load_entry_point('nose2==0.4.7', 'console_scripts', 'nose2')() File "C:\dev\testing\Tests\PythonTests\venv\lib\site-packages\nose2-0.4.7-py2. 7.egg\nose2\main.py", line 284, in discover return main(*args, **kwargs) File "C:\dev\testing\Tests\PythonTests\venv\lib\site-packages\nose2-0.4.7-py2. 7.egg\nose2\main.py", line 98, in __init__ super(PluggableTestProgram, self).__init__(**kw) File "C:\dev\testing\Tests\PythonTests\venv\lib\site-packages\unittest2-0.5.1- py2.7.egg\unittest2\main.py", line 98, in __init__ self.runTests() File "C:\dev\testing\Tests\PythonTests\venv\lib\site-packages\nose2-0.4.7-py2. 7.egg\nose2\main.py", line 260, in runTests self.result = runner.run(self.test) File "C:\dev\testing\Tests\PythonTests\venv\lib\site-packages\nose2-0.4.7-py2. 7.egg\nose2\runner.py", line 53, in run executor(test, result) File "C:\dev\testing\Tests\PythonTests\venv\lib\site-packages\nose2-0.4.7-py2. 7.egg\nose2\plugins\mp.py", line 60, in _runmp ready, _, _ = select.select(rdrs, [], [], self.testRunTimeout) select.error: (10038, 'An operation was attempted on something that is not a soc ket') This is running python 2.7.5 (32bit) on Windows 7 in a virtualenv with six-1.1.0, unittest2-0.5.1 and nose2-0.4.7 (I get the same behavior outside of the venv, so I don't think that is the issue here).

Read the article
Execute JavaScript from within a C# assembly

- by ScottKoon

I'd like to execute JavaScript code from within a C# assembly and have the results of the JavaScript code returned to the calling C# code. It's easier to define things that I'm not trying to do: I'm not trying to call a JavaScript function on a web page from my code behind. I'm not trying to load a WebBrowser control. I don't want to have the JavaScript perform an AJAX call to a server. What I want to do is write unit tests in JavaScript and have then unit tests output JSON, even plain text would be fine. Then I want to have a generic C# class/executible that can load the file containing the JS, run the JS unit tests, scrap/load the results, and return a pass/fail with details during a post-build task. I think it's possible using the old ActiveX ScriptControl, but it seems like there ought to be a .NET way to do this without using SilverLight, the DLR, or anything else that hasn't shipped yet. Anyone have any ideas? update: From Brad Abrams blog namespace Microsoft.JScript.Vsa { [Obsolete("There is no replacement for this feature. Please see the ICodeCompiler documentation for additional help. http://go.microsoft.com/fwlink/?linkid=14202")] Clarification: We have unit tests for our JavaScript functions that are written in JavaScript using the JSUnit framework. Right now during our build process, we have to manually load a web page and click a button to ensure that all of the JavaScript unit tests pass. I'd like to be able to execute the tests during the post-build process when our automated C# unit tests are run and report the success/failure alongside of out C# unit tests and use them as an indicator as to whether or not the build is broken.

Read the article
Using R to Analyze G1GC Log Files

- by user12620111

Using R to Analyze G1GC Log Files body, td { font-family: sans-serif; background-color: white; font-size: 12px; margin: 8px; } tt, code, pre { font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace; } h1 { font-size:2.2em; } h2 { font-size:1.8em; } h3 { font-size:1.4em; } h4 { font-size:1.0em; } h5 { font-size:0.9em; } h6 { font-size:0.8em; } a:visited { color: rgb(50%, 0%, 50%); } pre { margin-top: 0; max-width: 95%; border: 1px solid #ccc; white-space: pre-wrap; } pre code { display: block; padding: 0.5em; } code.r, code.cpp { background-color: #F8F8F8; } table, td, th { border: none; } blockquote { color:#666666; margin:0; padding-left: 1em; border-left: 0.5em #EEE solid; } hr { height: 0px; border-bottom: none; border-top-width: thin; border-top-style: dotted; border-top-color: #999999; } @media print { * { background: transparent !important; color: black !important; filter:none !important; -ms-filter: none !important; } body { font-size:12pt; max-width:100%; } a, a:visited { text-decoration: underline; } hr { visibility: hidden; page-break-before: always; } pre, blockquote { padding-right: 1em; page-break-inside: avoid; } tr, img { page-break-inside: avoid; } img { max-width: 100% !important; } @page :left { margin: 15mm 20mm 15mm 10mm; } @page :right { margin: 15mm 10mm 15mm 20mm; } p, h2, h3 { orphans: 3; widows: 3; } h2, h3 { page-break-after: avoid; } } pre .operator, pre .paren { color: rgb(104, 118, 135) } pre .literal { color: rgb(88, 72, 246) } pre .number { color: rgb(0, 0, 205); } pre .comment { color: rgb(76, 136, 107); } pre .keyword { color: rgb(0, 0, 255); } pre .identifier { color: rgb(0, 0, 0); } pre .string { color: rgb(3, 106, 7); } var hljs=new function(){function m(p){return p.replace(/&/gm,"&").replace(/"}while(y.length||w.length){var v=u().splice(0,1)[0];z+=m(x.substr(q,v.offset-q));q=v.offset;if(v.event=="start"){z+=t(v.node);s.push(v.node)}else{if(v.event=="stop"){var p,r=s.length;do{r--;p=s[r];z+=("")}while(p!=v.node);s.splice(r,1);while(r'+M[0]+""}else{r+=M[0]}O=P.lR.lastIndex;M=P.lR.exec(L)}return r+L.substr(O,L.length-O)}function J(L,M){if(M.sL&&e[M.sL]){var r=d(M.sL,L);x+=r.keyword_count;return r.value}else{return F(L,M)}}function I(M,r){var L=M.cN?'':"";if(M.rB){y+=L;M.buffer=""}else{if(M.eB){y+=m(r)+L;M.buffer=""}else{y+=L;M.buffer=r}}D.push(M);A+=M.r}function G(N,M,Q){var R=D[D.length-1];if(Q){y+=J(R.buffer+N,R);return false}var P=q(M,R);if(P){y+=J(R.buffer+N,R);I(P,M);return P.rB}var L=v(D.length-1,M);if(L){var O=R.cN?"":"";if(R.rE){y+=J(R.buffer+N,R)+O}else{if(R.eE){y+=J(R.buffer+N,R)+O+m(M)}else{y+=J(R.buffer+N+M,R)+O}}while(L1){O=D[D.length-2].cN?"":"";y+=O;L--;D.length--}var r=D[D.length-1];D.length--;D[D.length-1].buffer="";if(r.starts){I(r.starts,"")}return R.rE}if(w(M,R)){throw"Illegal"}}var E=e[B];var D=[E.dM];var A=0;var x=0;var y="";try{var s,u=0;E.dM.buffer="";do{s=p(C,u);var t=G(s[0],s[1],s[2]);u+=s[0].length;if(!t){u+=s[1].length}}while(!s[2]);if(D.length1){throw"Illegal"}return{r:A,keyword_count:x,value:y}}catch(H){if(H=="Illegal"){return{r:0,keyword_count:0,value:m(C)}}else{throw H}}}function g(t){var p={keyword_count:0,r:0,value:m(t)};var r=p;for(var q in e){if(!e.hasOwnProperty(q)){continue}var s=d(q,t);s.language=q;if(s.keyword_count+s.rr.keyword_count+r.r){r=s}if(s.keyword_count+s.rp.keyword_count+p.r){r=p;p=s}}if(r.language){p.second_best=r}return p}function i(r,q,p){if(q){r=r.replace(/^((]+|\t)+)/gm,function(t,w,v,u){return w.replace(/\t/g,q)})}if(p){r=r.replace(/\n/g,"")}return r}function n(t,w,r){var x=h(t,r);var v=a(t);var y,s;if(v){y=d(v,x)}else{return}var q=c(t);if(q.length){s=document.createElement("pre");s.innerHTML=y.value;y.value=k(q,c(s),x)}y.value=i(y.value,w,r);var u=t.className;if(!u.match("(\\s|^)(language-)?"+v+"(\\s|$)")){u=u?(u+" "+v):v}if(/MSIE [678]/.test(navigator.userAgent)&&t.tagName=="CODE"&&t.parentNode.tagName=="PRE"){s=t.parentNode;var p=document.createElement("div");p.innerHTML=""+y.value+"";t=p.firstChild.firstChild;p.firstChild.cN=s.cN;s.parentNode.replaceChild(p.firstChild,s)}else{t.innerHTML=y.value}t.className=u;t.result={language:v,kw:y.keyword_count,re:y.r};if(y.second_best){t.second_best={language:y.second_best.language,kw:y.second_best.keyword_count,re:y.second_best.r}}}function o(){if(o.called){return}o.called=true;var r=document.getElementsByTagName("pre");for(var p=0;p|=||=||=|\\?|\\[|\\{|\\(|\\^|\\^=|\\||\\|=|\\|\\||~";this.ER="(?![\\s\\S])";this.BE={b:"\\\\.",r:0};this.ASM={cN:"string",b:"'",e:"'",i:"\\n",c:[this.BE],r:0};this.QSM={cN:"string",b:'"',e:'"',i:"\\n",c:[this.BE],r:0};this.CLCM={cN:"comment",b:"//",e:"$"};this.CBLCLM={cN:"comment",b:"/\\*",e:"\\*/"};this.HCM={cN:"comment",b:"#",e:"$"};this.NM={cN:"number",b:this.NR,r:0};this.CNM={cN:"number",b:this.CNR,r:0};this.BNM={cN:"number",b:this.BNR,r:0};this.inherit=function(r,s){var p={};for(var q in r){p[q]=r[q]}if(s){for(var q in s){p[q]=s[q]}}return p}}();hljs.LANGUAGES.cpp=function(){var a={keyword:{"false":1,"int":1,"float":1,"while":1,"private":1,"char":1,"catch":1,"export":1,virtual:1,operator:2,sizeof:2,dynamic_cast:2,typedef:2,const_cast:2,"const":1,struct:1,"for":1,static_cast:2,union:1,namespace:1,unsigned:1,"long":1,"throw":1,"volatile":2,"static":1,"protected":1,bool:1,template:1,mutable:1,"if":1,"public":1,friend:2,"do":1,"return":1,"goto":1,auto:1,"void":2,"enum":1,"else":1,"break":1,"new":1,extern:1,using:1,"true":1,"class":1,asm:1,"case":1,typeid:1,"short":1,reinterpret_cast:2,"default":1,"double":1,register:1,explicit:1,signed:1,typename:1,"try":1,"this":1,"switch":1,"continue":1,wchar_t:1,inline:1,"delete":1,alignof:1,char16_t:1,char32_t:1,constexpr:1,decltype:1,noexcept:1,nullptr:1,static_assert:1,thread_local:1,restrict:1,_Bool:1,complex:1},built_in:{std:1,string:1,cin:1,cout:1,cerr:1,clog:1,stringstream:1,istringstream:1,ostringstream:1,auto_ptr:1,deque:1,list:1,queue:1,stack:1,vector:1,map:1,set:1,bitset:1,multiset:1,multimap:1,unordered_set:1,unordered_map:1,unordered_multiset:1,unordered_multimap:1,array:1,shared_ptr:1}};return{dM:{k:a,i:"",k:a,r:10,c:["self"]}]}}}();hljs.LANGUAGES.r={dM:{c:[hljs.HCM,{cN:"number",b:"\\b0[xX][0-9a-fA-F]+[Li]?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+(?:[eE][+\\-]?\\d*)?L\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+\\.(?!\\d)(?:i\\b)?",e:hljs.IMMEDIATE_RE,r:1},{cN:"number",b:"\\b\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"keyword",b:"(?:tryCatch|library|setGeneric|setGroupGeneric)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\.",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\d+(?![\\w.])",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\b(?:function)",e:hljs.IMMEDIATE_RE,r:2},{cN:"keyword",b:"(?:if|in|break|next|repeat|else|for|return|switch|while|try|stop|warning|require|attach|detach|source|setMethod|setClass)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"literal",b:"(?:NA|NA_integer_|NA_real_|NA_character_|NA_complex_)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"literal",b:"(?:NULL|TRUE|FALSE|T|F|Inf|NaN)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"identifier",b:"[a-zA-Z.][a-zA-Z0-9._]*\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"operator",b:"|=|| Using R to Analyze G1GC Log Files Using R to Analyze G1GC Log Files Introduction Working in Oracle Platform Integration gives an engineer opportunities to work on a wide array of technologies. My team’s goal is to make Oracle applications run best on the Solaris/SPARC platform. When looking for bottlenecks in a modern applications, one needs to be aware of not only how the CPUs and operating system are executing, but also network, storage, and in some cases, the Java Virtual Machine. I was recently presented with about 1.5 GB of Java Garbage First Garbage Collector log file data. If you’re not familiar with the subject, you might want to review Garbage First Garbage Collector Tuning by Monica Beckwith. The customer had been running Java HotSpot 1.6.0_31 to host a web application server. I was told that the Solaris/SPARC server was running a Java process launched using a commmand line that included the following flags: -d64 -Xms9g -Xmx9g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintFlagsFinal -XX:+DisableExplicitGC -XX:+UnlockExperimentalVMOptions -XX:ParallelGCThreads=8 Several sources on the internet indicate that if I were to print out the 1.5 GB of log files, it would require enough paper to fill the bed of a pick up truck. Of course, it would be fruitless to try to scan the log files by hand. Tools will be required to summarize the contents of the log files. Others have encountered large Java garbage collection log files. There are existing tools to analyze the log files: IBM’s GC toolkit The chewiebug GCViewer gchisto HPjmeter Instead of using one of the other tools listed, I decide to parse the log files with standard Unix tools, and analyze the data with R. Data Cleansing The log files arrived in two different formats. I guess that the difference is that one set of log files was generated using a more verbose option, maybe -XX:+PrintHeapAtGC, and the other set of log files was generated without that option. Format 1 In some of the log files, the log files with the less verbose format, a single trace, i.e. the report of a singe garbage collection event, looks like this: {Heap before GC invocations=12280 (full 61): garbage-first heap total 9437184K, used 7499918K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 1 young (4096K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. 2014-05-14T07:24:00.988-0700: 60586.353: [GC pause (young) 7324M->7320M(9216M), 0.1567265 secs] Heap after GC invocations=12281 (full 61): garbage-first heap total 9437184K, used 7496533K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 0 young (0K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. } A simple grep can be used to extract a summary: $ grep "\[ GC pause (young" g1gc.log 2014-05-13T13:24:35.091-0700: 3.109: [GC pause (young) 20M->5029K(9216M), 0.0146328 secs] 2014-05-13T13:24:35.440-0700: 3.459: [GC pause (young) 9125K->6077K(9216M), 0.0086723 secs] 2014-05-13T13:24:37.581-0700: 5.599: [GC pause (young) 25M->8470K(9216M), 0.0203820 secs] 2014-05-13T13:24:42.686-0700: 10.704: [GC pause (young) 44M->15M(9216M), 0.0288848 secs] 2014-05-13T13:24:48.941-0700: 16.958: [GC pause (young) 51M->20M(9216M), 0.0491244 secs] 2014-05-13T13:24:56.049-0700: 24.066: [GC pause (young) 92M->26M(9216M), 0.0525368 secs] 2014-05-13T13:25:34.368-0700: 62.383: [GC pause (young) 602M->68M(9216M), 0.1721173 secs] But that format wasn't easily read into R, so I needed to be a bit more tricky. I used the following Unix command to create a summary file that was easy for R to read. $ echo "SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime" $ grep "\[GC pause (young" g1gc.log | grep -v mark | sed -e 's/[A-SU-z,]/ /g' -e 's/->/ /' -e 's/: / /g' | more SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime 2014-05-13T13:24:35.091-0700 3.109 20 5029 9216 0.0146328 2014-05-13T13:24:35.440-0700 3.459 9125 6077 9216 0.0086723 2014-05-13T13:24:37.581-0700 5.599 25 8470 9216 0.0203820 2014-05-13T13:24:42.686-0700 10.704 44 15 9216 0.0288848 2014-05-13T13:24:48.941-0700 16.958 51 20 9216 0.0491244 2014-05-13T13:24:56.049-0700 24.066 92 26 9216 0.0525368 2014-05-13T13:25:34.368-0700 62.383 602 68 9216 0.1721173 Format 2 In some of the log files, the log files with the more verbose format, a single trace, i.e. the report of a singe garbage collection event, was more complicated than Format 1. Here is a text file with an example of a single G1GC trace in the second format. As you can see, it is quite complicated. It is nice that there is so much information available, but the level of detail can be overwhelming. I wrote this awk script (download) to summarize each trace on a single line. #!/usr/bin/env awk -f BEGIN { printf("SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize\n") } ###################### # Save count data from lines that are at the start of each G1GC trace. # Each trace starts out like this: # {Heap before GC invocations=14 (full 0): # garbage-first heap total 9437184K, used 325496K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) ###################### /{Heap.*full/{ gsub ( "\\)" , "" ); nf=split($0,a,"="); split(a[2],b," "); getline; if ( match($0, "first") ) { G1GC=1; IncrementalCount=b[1]; FullCount=substr( b[3], 1, length(b[3])-1 ); } else { G1GC=0; } } ###################### # Pull out time stamps that are in lines with this format: # 2014-05-12T14:02:06.025-0700: 94.312: [GC pause (young), 0.08870154 secs] ###################### /GC pause/ { DateTime=$1; SecondsSinceLaunch=substr($2, 1, length($2)-1); } ###################### # Heap sizes are in lines that look like this: # [ 4842M->4838M(9216M)] ###################### /\[ .*]$/ { gsub ( "\\[" , "" ); gsub ( "\ \]" , "" ); gsub ( "->" , " " ); gsub ( "\$ " , " " ); gsub ( "\ $" , " " ); split($0,a," "); if ( split(a[1],b,"M") > 1 ) {BeforeSize=b[1]*1024;} if ( split(a[1],b,"K") > 1 ) {BeforeSize=b[1];} if ( split(a[2],b,"M") > 1 ) {AfterSize=b[1]*1024;} if ( split(a[2],b,"K") > 1 ) {AfterSize=b[1];} if ( split(a[3],b,"M") > 1 ) {TotalSize=b[1]*1024;} if ( split(a[3],b,"K") > 1 ) {TotalSize=b[1];} } ###################### # Emit an output line when you find input that looks like this: # [Times: user=1.41 sys=0.08, real=0.24 secs] ###################### /\[Times/ { if (G1GC==1) { gsub ( "," , "" ); split($2,a,"="); UserTime=a[2]; split($3,a,"="); SysTime=a[2]; split($4,a,"="); RealTime=a[2]; print DateTime,SecondsSinceLaunch,IncrementalCount,FullCount,UserTime,SysTime,RealTime,BeforeSize,AfterSize,TotalSize; G1GC=0; } } The resulting summary is about 25X smaller that the original file, but still difficult for a human to digest. SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ... 2014-05-12T18:36:34.669-0700: 3985.744 561 0 0.57 0.06 0.16 1724416 1720320 9437184 2014-05-12T18:36:34.839-0700: 3985.914 562 0 0.51 0.06 0.19 1724416 1720320 9437184 2014-05-12T18:36:35.069-0700: 3986.144 563 0 0.60 0.04 0.27 1724416 1721344 9437184 2014-05-12T18:36:35.354-0700: 3986.429 564 0 0.33 0.04 0.09 1725440 1722368 9437184 2014-05-12T18:36:35.545-0700: 3986.620 565 0 0.58 0.04 0.17 1726464 1722368 9437184 2014-05-12T18:36:35.726-0700: 3986.801 566 0 0.43 0.05 0.12 1726464 1722368 9437184 2014-05-12T18:36:35.856-0700: 3986.930 567 0 0.30 0.04 0.07 1726464 1723392 9437184 2014-05-12T18:36:35.947-0700: 3987.023 568 0 0.61 0.04 0.26 1727488 1723392 9437184 2014-05-12T18:36:36.228-0700: 3987.302 569 0 0.46 0.04 0.16 1731584 1724416 9437184 Reading the Data into R Once the GC log data had been cleansed, either by processing the first format with the shell script, or by processing the second format with the awk script, it was easy to read the data into R. g1gc.df = read.csv("summary.txt", row.names = NULL, stringsAsFactors=FALSE,sep="") str(g1gc.df) ## 'data.frame': 8307 obs. of 10 variables: ## $ row.names : chr "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ... ## $ SecondsSinceLaunch: num 1.16 1.47 1.97 3.83 6.1 ... ## $ IncrementalCount : int 0 1 2 3 4 5 6 7 8 9 ... ## $ FullCount : int 0 0 0 0 0 0 0 0 0 0 ... ## $ UserTime : num 0.11 0.05 0.04 0.21 0.08 0.26 0.31 0.33 0.34 0.56 ... ## $ SysTime : num 0.04 0.01 0.01 0.05 0.01 0.06 0.07 0.06 0.07 0.09 ... ## $ RealTime : num 0.02 0.02 0.01 0.04 0.02 0.04 0.05 0.04 0.04 0.06 ... ## $ BeforeSize : int 8192 5496 5768 22528 24576 43008 34816 53248 55296 93184 ... ## $ AfterSize : int 1400 1672 2557 4907 7072 14336 16384 18432 19456 21504 ... ## $ TotalSize : int 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 ... head(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount ## 1 2014-05-12T14:00:32.868-0700: 1.161 0 ## 2 2014-05-12T14:00:33.179-0700: 1.472 1 ## 3 2014-05-12T14:00:33.677-0700: 1.969 2 ## 4 2014-05-12T14:00:35.538-0700: 3.830 3 ## 5 2014-05-12T14:00:37.811-0700: 6.103 4 ## 6 2014-05-12T14:00:41.428-0700: 9.720 5 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 1 0 0.11 0.04 0.02 8192 1400 9437184 ## 2 0 0.05 0.01 0.02 5496 1672 9437184 ## 3 0 0.04 0.01 0.01 5768 2557 9437184 ## 4 0 0.21 0.05 0.04 22528 4907 9437184 ## 5 0 0.08 0.01 0.02 24576 7072 9437184 ## 6 0 0.26 0.06 0.04 43008 14336 9437184 Basic Statistics Once the data has been read into R, simple statistics are very easy to generate. All of the numbers from high school statistics are available via simple commands. For example, generate a summary of every column: summary(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount FullCount ## Length:8307 Min. : 1 Min. : 0 Min. : 0.0 ## Class :character 1st Qu.: 9977 1st Qu.:2048 1st Qu.: 0.0 ## Mode :character Median :12855 Median :4136 Median : 12.0 ## Mean :12527 Mean :4156 Mean : 31.6 ## 3rd Qu.:15758 3rd Qu.:6262 3rd Qu.: 61.0 ## Max. :55484 Max. :8391 Max. :113.0 ## UserTime SysTime RealTime BeforeSize ## Min. :0.040 Min. :0.0000 Min. : 0.0 Min. : 5476 ## 1st Qu.:0.470 1st Qu.:0.0300 1st Qu.: 0.1 1st Qu.:5137920 ## Median :0.620 Median :0.0300 Median : 0.1 Median :6574080 ## Mean :0.751 Mean :0.0355 Mean : 0.3 Mean :5841855 ## 3rd Qu.:0.920 3rd Qu.:0.0400 3rd Qu.: 0.2 3rd Qu.:7084032 ## Max. :3.370 Max. :1.5600 Max. :488.1 Max. :8696832 ## AfterSize TotalSize ## Min. : 1380 Min. :9437184 ## 1st Qu.:5002752 1st Qu.:9437184 ## Median :6559744 Median :9437184 ## Mean :5785454 Mean :9437184 ## 3rd Qu.:7054336 3rd Qu.:9437184 ## Max. :8482816 Max. :9437184 Q: What is the total amount of User CPU time spent in garbage collection? sum(g1gc.df$UserTime) ## [1] 6236 As you can see, less than two hours of CPU time was spent in garbage collection. Is that too much? To find the percentage of time spent in garbage collection, divide the number above by total_elapsed_time*CPU_count. In this case, there are a lot of CPU’s and it turns out the the overall amount of CPU time spent in garbage collection isn’t a problem when viewed in isolation. When calculating rates, i.e. events per unit time, you need to ask yourself if the rate is homogenous across the time period in the log file. Does the log file include spikes of high activity that should be separately analyzed? Averaging in data from nights and weekends with data from business hours may alias problems. If you have a reason to suspect that the garbage collection rates include peaks and valleys that need independent analysis, see the “Time Series” section, below. Q: How much garbage is collected on each pass? The amount of heap space that is recovered per GC pass is surprisingly low: At least one collection didn’t recover any data. (“Min.=0”) 25% of the passes recovered 3MB or less. (“1st Qu.=3072”) Half of the GC passes recovered 4MB or less. (“Median=4096”) The average amount recovered was 56MB. (“Mean=56390”) 75% of the passes recovered 36MB or less. (“3rd Qu.=36860”) At least one pass recovered 2GB. (“Max.=2121000”) g1gc.df$Delta = g1gc.df$BeforeSize - g1gc.df$AfterSize summary(g1gc.df$Delta) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0 3070 4100 56400 36900 2120000 Q: What is the maximum User CPU time for a single collection? The worst garbage collection (“Max.”) is many standard deviations away from the mean. The data appears to be right skewed. summary(g1gc.df$UserTime) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.040 0.470 0.620 0.751 0.920 3.370 sd(g1gc.df$UserTime) ## [1] 0.3966 Basic Graphics Once the data is in R, it is trivial to plot the data with formats including dot plots, line charts, bar charts (simple, stacked, grouped), pie charts, boxplots, scatter plots histograms, and kernel density plots. Histogram of User CPU Time per Collection I don't think that this graph requires any explanation. hist(g1gc.df$UserTime, main="User CPU Time per Collection", xlab="Seconds", ylab="Frequency") Box plot to identify outliers When the initial data is viewed with a box plot, you can see the one crazy outlier in the real time per GC. Save this data point for future analysis and drop the outlier so that it’s not throwing off our statistics. Now the box plot shows many outliers, which will be examined later, using times series analysis. Notice that the scale of the x-axis changes drastically once the crazy outlier is removed. par(mfrow=c(2,1)) boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(dominated by a crazy outlier)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") crazy.outlier.df=g1gc.df[g1gc.df$RealTime > 400,] g1gc.df=g1gc.df[g1gc.df$RealTime < 400,] boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(crazy outlier excluded)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") box(which = "outer", lty = "solid") Here is the crazy outlier for future analysis: crazy.outlier.df ## row.names SecondsSinceLaunch IncrementalCount ## 8233 2014-05-12T23:15:43.903-0700: 20741 8316 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 8233 112 0.55 0.42 488.1 8381440 8235008 9437184 ## Delta ## 8233 146432 R Time Series Data To analyze the garbage collection as a time series, I’ll use Z’s Ordered Observations (zoo). “zoo is the creator for an S3 class of indexed totally ordered observations which includes irregular time series.” require(zoo) ## Loading required package: zoo ## ## Attaching package: 'zoo' ## ## The following objects are masked from 'package:base': ## ## as.Date, as.Date.numeric head(g1gc.df[,1]) ## [1] "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" ## [3] "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ## [5] "2014-05-12T14:00:37.811-0700:" "2014-05-12T14:00:41.428-0700:" options("digits.secs"=3) times=as.POSIXct( g1gc.df[,1], format="%Y-%m-%dT%H:%M:%OS%z:") g1gc.z = zoo(g1gc.df[,-c(1)], order.by=times) head(g1gc.z) ## SecondsSinceLaunch IncrementalCount FullCount ## 2014-05-12 17:00:32.868 1.161 0 0 ## 2014-05-12 17:00:33.178 1.472 1 0 ## 2014-05-12 17:00:33.677 1.969 2 0 ## 2014-05-12 17:00:35.538 3.830 3 0 ## 2014-05-12 17:00:37.811 6.103 4 0 ## 2014-05-12 17:00:41.427 9.720 5 0 ## UserTime SysTime RealTime BeforeSize AfterSize ## 2014-05-12 17:00:32.868 0.11 0.04 0.02 8192 1400 ## 2014-05-12 17:00:33.178 0.05 0.01 0.02 5496 1672 ## 2014-05-12 17:00:33.677 0.04 0.01 0.01 5768 2557 ## 2014-05-12 17:00:35.538 0.21 0.05 0.04 22528 4907 ## 2014-05-12 17:00:37.811 0.08 0.01 0.02 24576 7072 ## 2014-05-12 17:00:41.427 0.26 0.06 0.04 43008 14336 ## TotalSize Delta ## 2014-05-12 17:00:32.868 9437184 6792 ## 2014-05-12 17:00:33.178 9437184 3824 ## 2014-05-12 17:00:33.677 9437184 3211 ## 2014-05-12 17:00:35.538 9437184 17621 ## 2014-05-12 17:00:37.811 9437184 17504 ## 2014-05-12 17:00:41.427 9437184 28672 Example of Two Benchmark Runs in One Log File The data in the following graph is from a different log file, not the one of primary interest to this article. I’m including this image because it is an example of idle periods followed by busy periods. It would be uninteresting to average the rate of garbage collection over the entire log file period. More interesting would be the rate of garbage collect in the two busy periods. Are they the same or different? Your production data may be similar, for example, bursts when employees return from lunch and idle times on weekend evenings, etc. Once the data is in an R Time Series, you can analyze isolated time windows. Clipping the Time Series data Flashing back to our test case… Viewing the data as a time series is interesting. You can see that the work intensive time period is between 9:00 PM and 3:00 AM. Lets clip the data to the interesting period: par(mfrow=c(2,1)) plot(g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Complete Log File", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") clipped.g1gc.z=window(g1gc.z, start=as.POSIXct("2014-05-12 21:00:00"), end=as.POSIXct("2014-05-13 03:00:00")) plot(clipped.g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Limited to Benchmark Execution", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") box(which = "outer", lty = "solid") Cumulative Incremental and Full GC count Here is the cumulative incremental and full GC count. When the line is very steep, it indicates that the GCs are repeating very quickly. Notice that the scale on the Y axis is different for full vs. incremental. plot(clipped.g1gc.z[,c(2:3)], main="Cumulative Incremental and Full GC count", xlab="Time of Day", col="#1b9e77") GC Analysis of Benchmark Execution using Time Series data In the following series of 3 graphs: The “After Size” show the amount of heap space in use after each garbage collection. Many Java objects are still referenced, i.e. alive, during each garbage collection. This may indicate that the application has a memory leak, or may indicate that the application has a very large memory footprint. Typically, an application's memory footprint plateau's in the early stage of execution. One would expect this graph to have a flat top. The steep decline in the heap space may indicate that the application crashed after 2:00. The second graph shows that the outliers in real execution time, discussed above, occur near 2:00. when the Java heap seems to be quite full. The third graph shows that Full GCs are infrequent during the first few hours of execution. The rate of Full GC's, (the slope of the cummulative Full GC line), changes near midnight. plot(clipped.g1gc.z[,c("AfterSize","RealTime","FullCount")], xlab="Time of Day", col=c("#1b9e77","red","#1b9e77")) GC Analysis of heap recovered Each GC trace includes the amount of heap space in use before and after the individual GC event. During garbage coolection, unreferenced objects are identified, the space holding the unreferenced objects is freed, and thus, the difference in before and after usage indicates how much space has been freed. The following box plot and bar chart both demonstrate the same point - the amount of heap space freed per garbage colloection is surprisingly low. par(mfrow=c(2,1)) boxplot(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", horizontal = TRUE, col="red") hist(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", breaks=100, col="red") box(which = "outer", lty = "solid") This graph is the most interesting. The dark blue area shows how much heap is occupied by referenced Java objects. This represents memory that holds live data. The red fringe at the top shows how much data was recovered after each garbage collection. barplot(clipped.g1gc.z[,c("AfterSize","Delta")], col=c("#7570b3","#e7298a"), xlab="Time of Day", border=NA) legend("topleft", c("Live Objects","Heap Recovered on GC"), fill=c("#7570b3","#e7298a")) box(which = "outer", lty = "solid") When I discuss the data in the log files with the customer, I will ask for an explaination for the large amount of referenced data resident in the Java heap. There are two are posibilities: There is a memory leak and the amount of space required to hold referenced objects will continue to grow, limited only by the maximum heap size. After the maximum heap size is reached, the JVM will throw an “Out of Memory” exception every time that the application tries to allocate a new object. If this is the case, the aplication needs to be debugged to identify why old objects are referenced when they are no longer needed. The application has a legitimate requirement to keep a large amount of data in memory. The customer may want to further increase the maximum heap size. Another possible solution would be to partition the application across multiple cluster nodes, where each node has responsibility for managing a unique subset of the data. Conclusion In conclusion, R is a very powerful tool for the analysis of Java garbage collection log files. The primary difficulty is data cleansing so that information can be read into an R data frame. Once the data has been read into R, a rich set of tools may be used for thorough evaluation.

Read the article
Puppet: how to use data from a MySQL table in Puppet 3.0 templates?

- by Luke404

I have some data whose source-of-truth is in a MySQL database, size is expected to max out at the some-thousands-rows range (in a worst-case scenario) and I'd like to use puppet to configure files on some servers with that data (mostly iterating through those rows in a template). I'm currently using Puppet 3.0.x, and I cannot change the fact that MySQL will be the authoritative source for that data. Please note, data comes from external sources and not from puppet or from the managed nodes. What possible approaches are there? Which one would you recommend? Would External Node Classifiers be useful here? My "last resort" would be regularly dumping the table to a YAML file and reading that through Hiera to a Puppet template, or to directly dump the table in one or more pre-formatted text file(s) ready to be copied to the nodes. There is an unanswered question on SF about system users but the fundamental issue is probably similar to mine - he's trying to get data out of MySQL.

Read the article
How can I compare Excel serial dates WITHOUT converting to mm/dd/yy type dates?

- by dwwilson66

I have a table that contains a number of values representing Excel serial dates. After a number of unsuccessful attempts to compare fields, my current approach is to do comparisons between serial dates instead of calendar dates. I am trying to summarize the data--by DAY--with formulae. CONSIDER: 41021 some data 41021.625 some data 41021.63542 some data 41022 some data 41022.26042 some data 41022.91667 some data 41023 some data 41023.375 some data DESIRED RESULT: 41021 sum of 41021, 41021.625 and 41021.63542 data 41022 sum of 41022, 41022.26042 and 41022.91667 data 41023 sum of 41023 and 41023.375 data In essence, for all instances of SerialDate.SerialTime, SUM data values associated with SerialDate.* regardless of the *.SerialTime for that date. While I can see how to do this by creating additional dates column formatted as =TEXT(<DateField>,"mm/dd/yyyy") I'm looking for a solution that will allow me to handle this 'conversion' in the formula, e.g.SUMIF((TEXT(<dateRange>,"yy/mm/dd"),=(TEXT(<dateField,"yy/mm/dd")),<dataRange> Make sense? Anyone have any ideas? Thanks

Read the article
What steps to take when CPAN installation fails?

- by pythonic metaphor

I have used CPAN to install perl modules on quite a few occasions, but I've been lucky enough to just have it work. Unfortunately, I was trying to install Thread::Pool today and one of the required dependencies, Thread::Converyor::Monitored failed the test: Test Summary Report ------------------- t/Conveyor-Monitored02.t (Wstat: 65280 Tests: 89 Failed: 0) Non-zero exit status: 255 Parse errors: Tests out of sequence. Found (2) but expected (4) Tests out of sequence. Found (4) but expected (5) Tests out of sequence. Found (5) but expected (6) Tests out of sequence. Found (3) but expected (7) Tests out of sequence. Found (6) but expected (8) Displayed the first 5 of 86 TAP syntax errors. Re-run prove with the -p option to see them all. Files=3, Tests=258, 6 wallclock secs ( 0.07 usr 0.03 sys + 4.04 cusr 1.25 csys = 5.39 CPU) Result: FAIL Failed 1/3 test programs. 0/258 subtests failed. make: *** [test_dynamic] Error 255 ELIZABETH/Thread-Conveyor-Monitored-0.12.tar.gz /usr/bin/make test -- NOT OK //hint// to see the cpan-testers results for installing this module, try: reports ELIZABETH/Thread-Conveyor-Monitored-0.12.tar.gz Running make install make test had returned bad status, won't install without force Failed during this command: ELIZABETH/Thread-Conveyor-Monitored-0.12.tar.gz: make_test NO What steps do you take to start seeing why an installation failed? I'm not even sure how to begin tracking down what's wrong.

Read the article
Is it faster to create indexes before or after data loading in MySQL?

- by Josh Glover

I have a data replication process that drops and recreates a few tables in a target database, then loads them up with data from a source database (running on another host, but that is immaterial to the question at hand). The target database does need primary keys and a few other indexes on its tables, but not during the data loading. I'm currently loading all of the data, then creating the indexes. However, index creation takes a pretty long time--30 minutes of my data loader's 5 and a half hour running time. My intuition tells me that creating the indexes at the end should be faster than creating them first, since the index would need to be rewritten with each insert. Can anyone tell me for sure which way is faster? FWIW, I'm running MySQL 5.1 with InnoDB tables.

Read the article
Is it useful check data integrity in one DAT tape?

- by maxim

I backup my data every day on tape using one drive DAT HP Storageworks DAT 160. I use one tape for every day and I turn them weekly. Every monday I check one tape randomly recover some files saved on it. I know that when data is saved on tape, the driver and backup software check data integrity, but I wonder if a manual check of some data saved has a sense or not. I re-use these tapes many times and I would be sure data are safe.

Read the article
SQL SERVER – Weekly Series – Memory Lane – #039

- by Pinal Dave

Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 FQL – Facebook Query Language Facebook list following advantages of FQL: Condensed XML reduces bandwidth and parsing costs. More complex requests can reduce the number of requests necessary. Provides a single consistent, unified interface for all of your data. It’s fun! UDF – Get the Day of the Week Function The day of the week can be retrieved in SQL Server by using the DatePart function. The value returned by the function is between 1 (Sunday) and 7 (Saturday). To convert this to a string representing the day of the week, use a CASE statement. UDF – Function to Get Previous And Next Work Day – Exclude Saturday and Sunday While reading ColdFusion blog of Ben Nadel Getting the Previous Day In ColdFusion, Excluding Saturday And Sunday, I realize that I use similar function on my SQL Server Database. This function excludes the Weekends (Saturday and Sunday), and it gets previous as well as next work day. Complete Series of SQL Server Interview Questions and Answers Data Warehousing Interview Questions and Answers – Introduction Data Warehousing Interview Questions and Answers – Part 1 Data Warehousing Interview Questions and Answers – Part 2 Data Warehousing Interview Questions and Answers – Part 3 Data Warehousing Interview Questions and Answers Complete List Download 2008 Introduction to Log Viewer In SQL Server all the windows event logs can be seen along with SQL Server logs. Interface for all the logs is same and can be launched from the same place. This log can be exported and filtered as well. DBCC SHRINKFILE Takes Long Time to Run If you are DBA who are involved with Database Maintenance and file group maintenance, you must have experience that many times DBCC SHRINKFILE operations takes a long time but any other operations with Database are relatively quicker. mssqlsystemresource – Resource Database The purpose of resource database is to facilitates upgrading to the new version of SQL Server without any hassle. In previous versions whenever version of SQL Server was upgraded all the previous version system objects needs to be dropped and new version system objects to be created. 2009 Puzzle – Write Script to Generate Primary Key and Foreign Key In SQL Server Management Studio (SSMS), there is no option to script all the keys. If one is required to script keys they will have to manually script each key one at a time. If database has many tables, generating one key at a time can be a very intricate task. I want to throw a question to all of you if any of you have scripts for the same purpose. Maximizing View of SQL Server Management Studio – Full Screen – New Screen I had explained the following two different methods: 1) Open Results in Separate Tab - This is a very interesting method as result pan shows up in a different tab instead of the splitting screen horizontally. 2) Open SSMS in Full Screen - This works always and to its best. Not many people are aware of this method; hence, very few people use it to enhance performance. 2010 Find Queries using Parallelism from Cached Plan T-SQL script gets all the queries and their execution plan where parallelism operations are kicked up. Pay attention there is TOP 10 is used, if you have lots of transactional operations, I suggest that you change TOP 10 to TOP 50 This is the list of the all the articles in the series of computed columns. SQL SERVER – Computed Column – PERSISTED and Storage This article talks about how computed columns are created and why they take more storage space than before. SQL SERVER – Computed Column – PERSISTED and Performance This article talks about how PERSISTED columns give better performance than non-persisted columns. SQL SERVER – Computed Column – PERSISTED and Performance – Part 2 This article talks about how non-persisted columns give better performance than PERSISTED columns. SQL SERVER – Computed Column and Performance – Part 3 This article talks about how Index improves the performance of Computed Columns. SQL SERVER – Computed Column – PERSISTED and Storage – Part 2 This article talks about how creating index on computed column does not grow the row length of table. SQL SERVER – Computed Columns – Index and Performance This article summarized all the articles related to computed columns. 2011 SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Data Warehousing Concepts – Day 21 of 31 What is Data Warehousing? What is Business Intelligence (BI)? What is a Dimension Table? What is Dimensional Modeling? What is a Fact Table? What are the Fundamental Stages of Data Warehousing? What are the Different Methods of Loading Dimension tables? Describes the Foreign Key Columns in Fact Table and Dimension Table? What is Data Mining? What is the Difference between a View and a Materialized View? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Data Warehousing Concepts – Day 22 of 31 What is OLTP? What is OLAP? What is the Difference between OLTP and OLAP? What is ODS? What is ER Diagram? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Data Warehousing Concepts – Day 23 of 31 What is ETL? What is VLDB? Is OLTP Database is Design Optimal for Data Warehouse? If denormalizing improves Data Warehouse Processes, then why is the Fact Table is in the Normal Form? What are Lookup Tables? What are Aggregate Tables? What is Real-Time Data-Warehousing? What are Conformed Dimensions? What is a Conformed Fact? How do you Load the Time Dimension? What is a Level of Granularity of a Fact Table? What are Non-Additive Facts? What is a Factless Facts Table? What are Slowly Changing Dimensions (SCD)? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Data Warehousing Concepts – Day 24 of 31 What is Hybrid Slowly Changing Dimension? What is BUS Schema? What is a Star Schema? What Snow Flake Schema? Differences between the Star and Snowflake Schema? What is Difference between ER Modeling and Dimensional Modeling? What is Degenerate Dimension Table? Why is Data Modeling Important? What is a Surrogate Key? What is Junk Dimension? What is a Data Mart? What is the Difference between OLAP and Data Warehouse? What is a Cube and Linked Cube with Reference to Data Warehouse? What is Snapshot with Reference to Data Warehouse? What is Active Data Warehousing? What is the Difference between Data Warehousing and Business Intelligence? What is MDS? Explain the Paradigm of Bill Inmon and Ralph Kimball. SQL SERVER – Azure Interview Questions and Answers – Guest Post by Paras Doshi – Day 25 of 31 Paras Doshi has submitted 21 interesting question and answers for SQL Azure. 1.What is SQL Azure? 2.What is cloud computing? 3.How is SQL Azure different than SQL server? 4.How many replicas are maintained for each SQL Azure database? 5.How can we migrate from SQL server to SQL Azure? 6.Which tools are available to manage SQL Azure databases and servers? 7.Tell me something about security and SQL Azure. 8.What is SQL Azure Firewall? 9.What is the difference between web edition and business edition? 10.How do we synchronize On Premise SQL server with SQL Azure? 11.How do we Backup SQL Azure Data? 12.What is the current pricing model of SQL Azure? 13.What is the current limitation of the size of SQL Azure DB? 14.How do you handle datasets larger than 50 GB? 15.What happens when the SQL Azure database reaches Max Size? 16.How many databases can we create in a single server? 17.How many servers can we create in a single subscription? 18.How do you improve the performance of a SQL Azure Database? 19.What is code near application topology? 20.What were the latest updates to SQL Azure service? 21.When does a workload on SQL Azure get throttled? SQL SERVER – Interview Questions and Answers – Guest Post by Malathi Mahadevan – Day 26 of 31 Malachi had asked a simple question which has several answers. Each answer makes you think and ponder about the reality of the IT world. Look at the simple question – ‘What is the toughest challenge you have faced in your present job and how did you handle it’? and its various answers. Each answer has its own story. SQL SERVER – Interview Questions and Answers – Guest Post by Rick Morelan – Day 27 of 31 Rick Morelan of Joes2Pros has written an excellent blog post on the subject how to find top N values. Most people are fully aware of how the TOP keyword works with a SELECT statement. After years preparing so many students to pass the SQL Certification I noticed they were pretty well prepared for job interviews too. Yes, they would do well in the interview but not great. There seemed to be a few questions that would come up repeatedly for almost everyone. Rick addresses similar questions in his lucid writing skills. 2012 Observation of Top with Index and Order of Resultset SQL Server has lots of things to learn and share. It is amazing to see how people evaluate and understand different techniques and styles differently when implementing. The real reason may be absolutely different but we may blame something totally different for the incorrect results. Read the blog post to learn more. How do I Record Video and Webcast How to Convert Hex to Decimal or INT Earlier I asked regarding a question about how to convert Hex to Decimal. I promised that I will post an answer with Due Credit to the author but never got around to post a blog post around it. Read the original post over here SQL SERVER – Question – How to Convert Hex to Decimal. Query to Get Unique Distinct Data Based on Condition – Eliminate Duplicate Data from Resultset The natural reaction will be to suggest DISTINCT or GROUP BY. However, not all the questions can be solved by DISTINCT or GROUP BY. Let us see the following example, where a user wanted only latest records to be displayed. Let us see the example to understand further. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article

Search Results

Search found 63182 results on 2528 pages for 'data driven tests'.

Page 448/2528 | < Previous Page | 444 445 446 447 448 449 450 451 452 453 454 455 | Next Page >

- by Horace Loeb

- by fatcat1111

- by crankharder

- by Arda Eralp

- by willem

- by janesconference

- by HerbalMart

- by user322787

- by Mahran Elneel

- by user123276

- by adoran

- by user315709

- by IT Helpdesk Team Manager

- by Andrew

- by Raphink

- by richard moss

- by tt293

- by ScottKoon

- by user12620111

- by Luke404

- by dwwilson66

- by pythonic metaphor

- by Josh Glover

- by maxim

- by Pinal Dave

< Previous Page | 444 445 446 447 448 449 450 451 452 453 454 455 | Next Page >