document databases - Page 398

SSAS Tabular Workshop online and other upcoming dates (and updates!) #ssas #tabular

- by Marco Russo (SQLBI)

After many conferences and travels, this summer I had some time to write and prepare new sessions for the next wave of conferences. In reality I am just doing that, even if I already restarted traveling for consulting and training. So expect new content about DAX and Tabular coming in the next months! Starting to see real customer adopting Tabular is showing many new challenges and there is still a lot to learn and to create. If you still didn’t started working on Tabular, well, you should. As I always say, as a BI developer you should be able to choose between Tabular and Multidimensional, and in order to do that you should know both of them! One thing that I don’t like very much about marketing is that “Tabular is simpler”, because it’s often translated in “Tabular is for simple projects” when this last statement is not true. Actually, I see a lot of good reasons to adopt Tabular in complex data models, especially in non-traditional scenarios. I know, this is because I love to understand what are the actual limits of a technology, and I’m learning that there is simple a lot of space of improvement also for Tabular. It’s already fast, but it could be faster! How can you start? Well, first of all, by reading our book. Then, by attending to our SSAS Tabular workshop. There is an online edition of the workshop on September 3-4, 2012 (hurry up if you want to register), and there are already several dates planned for the next months (and others will be added soon!). And, of course, by installing SQL Server 2012 and trying to create models over your databases. If you are too lazy, just start with PowerPivot. As soon as you start working with Tabular or PowerPivot, you will see that there is one important skill you need: learning DAX. In the next few days I should publish an article that I’m finishing these days about best practices using SUMMARIZE and ADDCOLUMNS. If only someone published this article one year ago, I would have saved many hours of my life. But, you know, flight manuals are written in blood… and someone has to write! Stay tuned.

Read the article

New hidden parameters in Oracle 11.2

- by Mike Dietrich

We really welcome every external review of our slides. And also recommendations from customers visiting our workshops. So it happened to me more than a week ago that Marco Patzwahl, the owner of MuniqSoft GmbH, had a very lengthy train ride in Germany (as the engine drivers go on strike this week it could have become even worse) and nothing better to do then reviewing our slide set. And he had plenty of recommendations. Besides that he pointed us to something at least I was not aware of and added it to the slides: In patch set 11.2.0.2 a new behaviour for datafile write errors has been implemented. With this release ANY write error to a datafile will cause the instance to abort. Before 11.2.0.2 those errors usually led to an offline datafile if the database operates in archivelog mode (your production database do, don’t they?!) and the datafile does not belong to the SYSTEM tablespace. Internal discussion found this behaviour not up-to-date and alligned with RAC systems and modern storages. Therefore it has been changed and a new underscore parameter got introduced. _DATAFILE_WRITE_ERRORS_CRASH_INSTANCE=TRUE This is the default setting´and the new behaviour beginning with Oracle 11.2.0.2 If you would like to revert to the pre-11.2.0.2 behaviour you’ll have to set in your init.ora/spfile this parameter to false. But keep in mind that there’s a reason why this has been changed. You’ll find more info in MOS Note: 7691270.8 and this topic in the current version of the slides on slide 255. Thanks to Marco for the review!! And then I received an email from Kurt Van Meerbeeck today. Kurt is pretty well known in the Oracle community. And he’s the owner of jDUL/DUDE, a database unloading tool which bypasses the Oracle database engine and access data direclty from the blocks. Kurt visited the upgrade workshop two weeks ago in Belgium and did highlight to me that since Oracle 11.2.0.1 even though you haven’t set neither SGA_TARGET nor MEMORY_TARGET the database might still do resize operations. Reason why this behaviour has been changed: Prevention of ORA-4031 errors. But on databases with extremly high loads this can cause trouble. Further information can be found in MOS Note:1269139.1 . And the parameter set to TRUE by default is called _MEMORY_IMM_MODE_WITHOUT_AUTOSGA=TRUE This can be found now in the slide set as well on slide number 240. And thanks to Kurt for this information!!

Read the article

Adding Column to a SQL Server Table

- by Dinesh Asanka

Adding a column to a table is common task for DBAs. You can add a column to a table which is a nullable column or which has default values. But are these two operations are similar internally and which method is optimal? Let us start this with an example. I created a database and a table using following script: USE master Go --Drop Database if exists IF EXISTS (SELECT 1 FROM SYS.databases WHERE name = 'AddColumn') DROP DATABASE AddColumn --Create the database CREATE DATABASE AddColumn GO USE AddColumn GO --Drop the table if exists IF EXISTS ( SELECT 1 FROM sys.tables WHERE Name = 'ExistingTable') DROP TABLE ExistingTable GO --Create the table CREATE TABLE ExistingTable (ID BIGINT IDENTITY(1,1) PRIMARY KEY CLUSTERED, DateTime1 DATETIME DEFAULT GETDATE(), DateTime2 DATETIME DEFAULT GETDATE(), DateTime3 DATETIME DEFAULT GETDATE(), DateTime4 DATETIME DEFAULT GETDATE(), Gendar CHAR(1) DEFAULT 'M', STATUS1 CHAR(1) DEFAULT 'Y' ) GO -- Insert 100,000 records with defaults records INSERT INTO ExistingTable DEFAULT VALUES GO 100000 Before adding a Column Before adding a column let us look at some of the details of the database. DBCC IND (AddColumn,ExistingTable,1) By running the above query, you will see 637 pages for the created table. Adding a Column You can add a column to the table with following statement. ALTER TABLE ExistingTable Add NewColumn INT NULL Above will add a column with a null value for the existing records. Alternatively you could add a column with default values. ALTER TABLE ExistingTable Add NewColumn INT NOT NULL DEFAULT 1 The above statement will add a column with a 1 value to the existing records. In the below table I measured the performance difference between above two statements. Parameter Nullable Column Default Value CPU 31 702 Duration 129 ms 6653 ms Reads 38 116,397 Writes 6 1329 Row Count 0 100000 If you look at the RowCount parameter, you can clearly see the difference. Though column is added in the first case, none of the rows are affected while in the second case all the rows are updated. That is the reason, why it has taken more duration and CPU to add column with Default value. We can verify this by several methods. Number of Pages The number of data pages can be obtained by using DBCC IND command. Though, this an undocumented dbcc command, many experts are ok to use this command in production. However, since there is no official word from Microsoft, use this “at your own risk”. DBCC IND (AddColumn,ExistingTable,1) Before Adding the Columns 637 Adding a Column with NULL 637 Adding a column with DEFAULT value 1270 This clearly shows that pages are physically modified. Please note, a high value indicated in the Adding a column with DEFAULT value column is also a result of page splits. Continues…

Read the article

It's called College.

- by jeffreyabecker

Today I saw yet another 'GUID vs int as your primary key' article. Like most of the ones I've read this was filled with technical misrepresentations and out-right fallices. Chef's famous line that "There's a time and a place for everything children" applies here. GUIDs have distinct advantages and disadvantages which should be considered when choosing a data type for the primary key. Fallacy 1: "Its easier" An integer data type(tinyint, smallint, int, bigint) is a better artifical key than a GUID because its easier to remember. I'm a firm believer that your artifical primary keys should be opaque gibberish. PK's are an implementation detail which should never be exposed to the user or relied on for business logic. If you want things to come back in an order, add and ORDER BY clause and SortOrder fields. If you want a human-usable look-up add a business key with a unique constraint. If you want to know what order things were inserted into a table add a timestamp. Fallacy 2: "Size Matters" For many applications, the size of the artifical primary key is going to be irrelevant. The particular article which kicked this post off stated repeatedly that joining against an int has better performance than joining against a GUID. In computer science the performance of your algorithm is always a function of the number of data points. This still holds true for databases. Unless your table is very large, the performance difference between an int and a guid probably isnt going to be mesurable let alone noticeable. My personal experience is that the performance becomes an issue when you start having billions of rows in the table. At this point, you should probably start looking to move from int to bigint so the effective space/performance gain isnt as much as you'd think. GUID Advantages: Insert-ability / Mergeability: You can reliably insert guids into tables without key collisions. Database Independence: Saving entities to the database often requires knowing ids. With identity based ids the id must be selected back after every insert. GUIDs can be generated application-side allowing much faster inserts. GUID Disadvantages: Generatability: You can calculate the next id for an integer pk pretty easily in your head but will need a program to generate GUIDs. Solution: "Select top 100 newid() from sysobjects" Fragmentation: most GUID generation algorithms generate pseudo random GUIDs. This can cause inserts into the middle of your clustered index. Solutions: add a default of newsequentialid() or use GuidComb in NHibernate.

Read the article

Should CSS be listed on your resume under Languages?

- by Sandeepan Nath

I have some doubts like Whether CSS should be put under Languages or not? Although Wikipedia says Cascading Style Sheets (CSS) is a style sheet language ... But do they write CSS under the languages section of the resume, along with PHP, etc? Similarly what about HTML? I have some doubt and I don't want to sound like someone who is not aware of the trends. Just to give an example, currently I have the following languages,frameworks, technologies, etc. listed under the "Technical Expertise" section of my resume - Technical Expertise * Languages - Proficient - PHP 5, Javascript, HTML ?*,CSS ?*,Sass ?*. Beginner - Linux Bash. * Databases - MySQL 5. * Technologies - AJAX. * Frameworks/Libraries - Symfony, jQuery. * CMSes - Wordpress. Although my domain is Web-development/design, I welcome domain-agnostic answers which can provide some generic ideas/reasoning. I have seen, a lot of people messing up these sections (even more serious than my doubts :) ), putting things under wrong sub-headings and thus putting a big question mark on their understanding of those things. I don't know much about XML, Comet Technology etc. Considering those are included too, What things should be put under Languages? E.g. Should CSS be put under Languages? Please give some reasoning to support your views. Where should the others (XML, Comet, cURL etc. ) be put? I welcome some examples of how you put it. Or do you have an additional Keywords section where you write all the unsortables ? Considering a set of standards like W3C standards, etc. do you have a standards sub-heading? I guess I have put the contents of other sections Okay. But do let me know about your ideas and reasoning. After all, I understand there may not be a single answer to this, but let's see what is the trend. Thanks Updates Further, do you mention design patters you have used? Web Services etc.? Where do you mention SOAP, XML etc...

Read the article

Tackling Big Data Analytics with Oracle Data Integrator

- by Irem Radzik

v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman","serif"; mso-fareast-font-family:"Times New Roman";} By Mike Eisterer The term big data draws a lot of attention, but behind the hype there's a simple story. For decades, companies have been making business decisions based on transactional data stored in relational databases. Beyond that critical data, however, is a potential treasure trove of less structured data: weblogs, social media, email, sensors, and documents that can be mined for useful information. Companies are facing emerging technologies, increasing data volumes, numerous data varieties and the processing power needed to efficiently analyze data which changes with high velocity. Oracle offers the broadest and most integrated portfolio of products to help you acquire and organize these diverse data sources and analyze them alongside your existing data to find new insights and capitalize on hidden relationships Oracle Data Integrator Enterprise Edition(ODI) is critical to any enterprise big data strategy. ODI and the Oracle Data Connectors provide native access to Hadoop, leveraging such technologies as MapReduce, HDFS and Hive. Alongside with ODI’s metadata driven approach for extracting, loading and transforming data; companies may now integrate their existing data with big data technologies and deliver timely and trusted data to their analytic and decision support platforms. In this session, you’ll learn about ODI and Oracle Big Data Connectors and how, coupled together, they provide the critical integration with multiple big data platforms. Tackling Big Data Analytics with Oracle Data Integrator October 1, 2012 12:15 PM at MOSCONE WEST – 3005 For other data integration sessions at OpenWorld, please check our Focus-On document. If you are not able to attend OpenWorld, please check out our latest resources for Data Integration.

Read the article

Big Data – Buzz Words: What is NewSQL – Day 10 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the relational database. In this article we will take a quick look at the what is NewSQL. What is NewSQL? NewSQL stands for new scalable and high performance SQL Database vendors. The products sold by NewSQL vendors are horizontally scalable. NewSQL is not kind of databases but it is about vendors who supports emerging data products with relational database properties (like ACID, Transaction etc.) along with high performance. Products from NewSQL vendors usually follow in memory data for speedy access as well are available immediate scalability. NewSQL term was coined by 451 groups analyst Matthew Aslett in this particular blog post. On the definition of NewSQL, Aslett writes: “NewSQL” is our shorthand for the various new scalable/high performance SQL database vendors. We have previously referred to these products as ‘ScalableSQL‘ to differentiate them from the incumbent relational database products. Since this implies horizontal scalability, which is not necessarily a feature of all the products, we adopted the term ‘NewSQL’ in the new report. And to clarify, like NoSQL, NewSQL is not to be taken too literally: the new thing about the NewSQL vendors is the vendor, not the SQL. In other words - NewSQL incorporates the concepts and principles of Structured Query Language (SQL) and NoSQL languages. It combines reliability of SQL with the speed and performance of NoSQL. Categories of NewSQL There are three major categories of the NewSQL New Architecture – In this framework each node owns a subset of the data and queries are split into smaller query to sent to nodes to process the data. E.g. NuoDB, Clustrix, VoltDB MySQL Engines – Highly Optimized storage engine for SQL with the interface of MySQ Lare the example of such category. E.g. InnoDB, Akiban Transparent Sharding – This system automatically split database across multiple nodes. E.g. Scalearc Summary In simple words – NewSQL is kind of database following relational database principals and provides scalability like NoSQL. Tomorrow In tomorrow’s blog post we will discuss about the Role of Cloud Computing in Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article

Swiss Re increases data warehouse performance and deploys in record time

- by KLaker

Great information on yet another data warehouse deployment on Exadata. A little background on Swiss Re: In 2002, Swiss Re established a data warehouse for its client markets and products to gather reinsurance information across all organizational units into an integrated structure. The data warehouse provided the basis for reporting at the group level with drill-down capability to individual contracts, while facilitating application integration and data exchange by using common data standards. Initially focusing on property and casualty reinsurance information only, it now includes life and health reinsurance, insurance, and nonlife insurance information. Key highlights of the benefits that Swiss Re achieved by using Exadata: Reduced the time to feed the data warehouse and generate data marts by 58% Reduced average runtime by 24% for standard reports comfortably loading two data warehouse refreshes per day with incremental feeds Freed up technical experts by significantly minimizing time spent on tuning activities Most importantly this was one of the fastest project deployments in Swiss Re's history. They went from installation to production in just four months! What is truly surprising is the that it only took two weeks between power-on to testing the machine with full data volumes! Business teams at Swiss Re are now able to fully exploit up-to-date analytics across property, casualty, life, health insurance, and reinsurance lines to identify successful products. These points are highlighted in the following quotes from Dr. Stephan Gutzwiller, Head of Data Warehouse Services at Swiss Re: "We were operating a complete Oracle stack, including servers, storage area network, operating systems, and databases that was well optimized and delivered very good performance over an extended period of time. When a hardware replacement was scheduled for 2012, Oracle Exadata was a natural choice—and the performance increase was impressive. It enabled us to deliver analytics to our internal customers faster, without hiring more IT staff" “The high quality data that is readily available with Oracle Exadata gives us the insight and agility we need to cater to client needs. We also can continue re-engineering to keep up with the increasing demand without having to grow the organization. This combination creates excellent business value.” Our full press release is available here: http://www.oracle.com/us/corporate/customers/customersearch/swiss-re-1-exadata-ss-2050409.html. If you want more information about how Exadata can increase the performance of your data warehouse visit our home page: http://www.oracle.com/us/products/database/exadata-database-machine/overview/index.html

Read the article

SQL Server MVP Deep Dives 2. The Awesome Returns.

- by Mladen Prajdic

Two years ago 59 SQL Server MVP's came together and helped make one of the best book on SQL Server out there. Each chapter was written by an MVP about a part of SQL Server they loved working with. This resulted in superb quality content and excellent ratings from the readers. To top it off all earnings went to a good cause, the War Child International organization. That book was SQL Server MVP Deep Dives. This year 63 SQL Server MVPs, me included, decided it was time do repeat the success of the first book. Let me introduce you the: SQL Server MVP Deep Dives 2 The topics in 60 chapters are grouped in 5 groups: Architecture, Database Administration, Database Development, Performance Tuning and Optimization, Business Intelligence. They represent over 1000 years of daily experience in various areas of SQL Server. I have contributed chapter 28 in Database Development group titled Getting asynchronous with Service Broker. In it I show you the Service Broker template you can use for secure communication between two or more SQL server instances for whatever purpose you may have. If you haven't heard of Service Broker it's a part of the database engine that enables you to do completely async operations in the database itself or between databases and instances. The official release of the book will be next week at PASS where there will be 2 slots where most of the authors will be there signing the books you bring. This is also a great opportunity to meet everyone and ask about any problems you may have. So definitely come say hi. Again we decided on a charity that will be supported by this book. It's called Operation Smile. They provide free surgeries to repair cleft lip, cleft palate and other facial deformities for children around the globe. You can also help them by donating. You can preorder it on at Manning Publications website or on Amazon. By having it you not only get to learn a lot, improve your skills and have fun but you also help a child have a normal life. If that's not a good cause then I don't know what it is.

Read the article

Bay Area Coherence Special Interest Group Next Meeting July 21, 2011

- by csoto

Date: Thursday, July 21, 2011 Time: 4:30pm - 8:15pm ET (note that Parking at 475 Sansome Closes at 8:30pm) Where: Oracle Office, 475 Sansome Street, San Francisco, CA Google Map We will be providing snacks and beverages. Register! - Registration is required for building security. Presentation Line Up:? 5:10pm - Batch Processing Using Coherence in Oracle Group Policy Administration - Paul Cleary, Oracle Oracle Insurance Policy Administration (OIPA) is a flexible, rules-based policy administration solution that provides full record keeping for all policy lifecycle transactions. One component of OIPA is Cycle processing, which is the batch processing of pending insurance transactions. This presentation introduces OIPA and Cycle processing, describing the unique challenges of processing a high volume of transactions within strict time windows. It then reviews how OIPA uses Oracle Coherence and the Processing Pattern to meet these challenges, describing implementation specifics that highlight the simplicity and robustness of the Processing Pattern. 6:10pm - Secure, Optimize, and Load Balance Coherence with F5 - Chris Akker, F5 F5 Networks, Inc., the global leader in Application Delivery Networking, helps the world’s largest enterprises and service providers realize the full value of virtualization, cloud computing, and on-demand IT. Recently, F5 and Oracle partnered to deliver a novel solution that integrates Oracle Coherence 3.7 with F5 BIG-IP Local Traffic Manager (LTM). This session will introduce F5 and how you can leverage BIG-IP LTM to secure, optimize, and load balance application traffic generated from Coherence*Extend clients across any number of servers in a cluster and to hardware-accelerate CPU-intensive SSL encryption. 7:10pm - Using Oracle Coherence to Enable Database Partitioning and DC Level Fault Tolerance - Alexei Ragozin, Independent Consultant and Brian Oliver, Oracle Partitioning is a very powerful technique for scaling database centric applications. One tricky part of partitioned architecture is routing of requests to the right database. The routing layer (routing table) should know the right database instance for each attribute which may be used for routing (e.g. account id, login, email, etc): it should be fast, it should fault tolerant and it should scale. All the above makes Oracle Coherence a natural choice for implementing such routing tables in partitioned architectures. This presentation will cover synchronization of the grid with multiple databases, conflict resolution, cross cluster replication and other aspects related to implementing robust partitioned architecture. Additional Info:?? - Download Past Presentations: The presentations from the previous meetings of the BACSIG are available for download here. Click on the presentation titles to download the PDF files. - Join the Coherence online community on our Oracle Coherence Users Group on LinkedIn. - Contact BACSIG with any comments, questions, presentation proposals and content suggestions.

Read the article

Happy New Year! Upcoming Events in January 2011

- by mandy.ho

Oracle Database kicks off the New Year at the following events during the month of January. Hope to see you there and please send in your pictures and feedback! Jan 20, 2011 - San Francisco, CA LinkShare Symposium West 2011 Oracle is a proud Gold Sponsor at the LinkShare Symposium West 2011 January 20 in San Francisco, California. Year after year LinkShare has been bringing their network the opportunity to come to life. At the LinkShare Symposium online performance marketing leaders meet to optimize face-to-face during a full day of networking. Learn more by attending Oracle Breakout Session, "Omni - Channel Retailing, What is possible now?" on Thursday, January 20, 11:15 a.m. - 12:00 noon, Grand Ballroom. http://eventreg.oracle.com/webapps/events/ns/EventsDetail.jsp?p_eventId=128306&src=6954634&src=6954634&Act=397 Jan 24, 2011 - Cincinnati, OH Greater Cincinnati Oracle User Group Meeting "Tom Kyte Day" - Featuring a day of sessions presented by Senior Technical Architect, Tom Kyte. Sessions include "Top 10, no 11, new features of Oracle Database 11g Release 2" and "What do I really need to know when upgrading", plus more. http://www.gcoug.org/ Jan 25, 2011 - Vancouver, British Columbia Oracle Security Solutions Forum Featuring a Special Keynote Presentation from Tom Kyte - Complete Database Security Join us at this half-day event; Oracle Database Security Solutions: Complete Information Security. Learn how Oracle Database Security solutions help you: • Prevent external threats like SQL injection attacks from reaching your databases • Transparently encrypt application data without application changes • Prevent privileged database users and administrators from accessing data • Use native database auditing to monitor and report on database activity • Mask production data for safe use in nonproduction environments http://eventreg.oracle.com/webapps/events/ns/EventsDetail.jsp?p_eventId=126974&src=6958351&src=6958351&Act=97 Jan 26, 2011 - Halifax, Nova Scotia Oracle Database Security Technology Day Exclusive Seminar on Complete Information Security with Oracle Database 11g The amount of digital data within organizations is growing at unprecedented rates, as is the value of that data and the challenges of safeguarding it. Yet most IT security programs fail to address database security--specifically, insecure applications and privileged users. So how can you protect your mission-critical information? Avoid risky third-party solutions? Defend against security breaches and compliance violations? And resist costly new infrastructure investments? Join us at this half-day seminar, Oracle Database Security Solutions: Complete Information Security, to find out http://eventreg.oracle.com/webapps/events/ns/EventsDetail.jsp?p_eventId=126269&src=6958351&src=6958351&Act=93

Read the article

Becoming an Expert MySQL DBA Across Five Continents

- by Antoinette O'Sullivan

You can take Oracle's MySQL Database Administrator training on five contents. In this 5-day, live, instructor-led course, you learn to install and optimize the MySQL Server, set up replication and security, perform database backups and performance tuning, and protect MySQL databases. Below is a selection of the in-class events already on the schedule for the MySQL for Database Administrators course. AFRICA Location Date Delivery Language Nairobi, Kenya 22 July 2013 English Johannesburg, South Africa 9 December 2013 English AMERICA Location Date Delivery Language Belmont, California, United States 22 July 2013 English ASIA Location Date Delivery Language Dehradun, India 11 July 2013 English Grogol - Jakarta Barat, Indonesia 16 September 2013 English Makati City, Philippines 5 August 2013 English Pasig City, Philippines 12 August 2013 English Istanbul, Turkey 12 August 2013 Turkish AUSTRALIA and OCEANIA Location Date Delivery Language Sydney, Australia 15 July 2013 English Auckland, New Zealand 5 August 2013 English Wellington, New Zealand 15 July 2013 English EUROPE Location Date Delivery Language London, England 9 September 2013 English Aix-en-Provence, France 2 December 2013 French Bordeaux Merignac, France 2 December 2013 French Puteaux, France 16 September 2013 French Dresden, Germany 26 August 2013 German Hamburg, Germany 16 November 2013 German Munich, Germany 19 August 2013 German Munster, Germany 9 September 2013 German Budapest, Hungary 4 November 2013 Hungarian Belfast, Ireland 16 December 2013 English Milan, Italy 7 October 2013 Italian Rome, Italy 16 September 2013 Italian Utrecht, Netherlands 16 September 2013 English Warsaw, Poland 5 August 2013 Polish Lisbon, Portugal 16 September 2013 European Portugese Barcelona, Spain 30 October 2013 Spanish Madrid, Spain 4 November 2013 Spanish Bern, Switzerland 27 November 2013 German Zurich, Switzerland 27 November 2013 German You can also take this course from your own desk as a live-virtual class, choosing from a wide selection of events already on the schedule suiting different timezones. To register for this course or to learn more about the authentic MySQL curriculum, go to http://oracle.com/education/mysql.

Read the article

Backup Azure Tables with the Enzo Backup API

- by Herve Roggero

In case you missed it, you can now backup (and restore) Azure Tables and SQL Databases using an API directly. The features available through the API can be found here: http://www.bluesyntax.net/backup20api.aspx and the online help for the API is here: http://www.bluesyntax.net/EnzoCloudBackup20/APIIntro.aspx. Backing up Azure Tables can’t be any easier than with the Enzo Backup API. Here is a sample code that does the trick: // Create the backup helper class. The constructor automatically sets the SourceStorageAccount property StorageBackupHelper backup = new StorageBackupHelper("storageaccountname", "storageaccountkey", "sourceStorageaccountname", "sourceStorageaccountkey", true, "apilicensekey"); // Now set some properties… backup.UseCloudAgent = false; // backup locally backup.DeviceURI = @"c:\TMP\azuretablebackup.bkp"; // to this file backup.Override = true; backup.Location = DeviceLocation.LocalFile; // Set optional performance options backup.PKTableStrategy.Mode = BSC.Backup.API.TableStrategyMode.GUID; // Set GUID strategy by default backup.MaxRESTPerSec = 200; // Attempt to stay below 200 REST calls per second // Start the backup now… string taskId = backup.Backup(); // Use the Environment class to get the final status of the operation EnvironmentHelper env = new EnvironmentHelper("storageaccountname", "storageaccountkey", "apilicensekey"); string status = env.GetOperationStatus(taskId); As you can see above, the code is straightforward. You provide connection settings in the constructor, set a few options indicating where the backup device will be located, set optional performance parameters and start the backup. The performance options are designed to help you backup your Azure Tables quickly, while attempting to keep under a specific threshold to prevent Storage Account throttling. For example, the MaxRESTPerSec property will attempt to keep the overall backup operation under 200 rest calls per second. Another performance option if the Backup Strategy for Azure Tables. By default, all tables are simply scanned. While this works best for smaller Azure Tables, larger tables can use the GUID strategy, which will issue requests against an Azure Table in parallel assuming the PartitionKey stores GUID values. It doesn’t mean that your PartitionKey must have GUIDs however for this strategy to work; but the backup algorithm is tuned for this condition. Other options are available as well, such as filtering which columns, entities or tables are being backed up. Check out more on the Blue Syntax website at http://www.bluesyntax.net.

Read the article

What is new in Oracle SOA Suite 11g R1 PS6? by Shanny Anoep

- by JuergenKress

Oracle has released a new version 11.1.1.7.0 for their Oracle Fusion Middleware product line. This version includes Patch Set #6 (PS6) for Oracle SOA Suite 11g R1, with a big list of improvements and fixes for each component in that suite. In this post we will highlight some of the interesting updates with regards to troubleshooting, performance, reliability and scalability. Infrastructure/Purging scripts Database growth is a common problem for large-scale Oracle SOA Suite deployments. Oracle already provides multiple purging strategies for the SOA Suite runtime database. This patch set includes two new scripts for purging most of the runtime data: Table Recreation Script (TRS): This script can be used to reclaim as much database space as possible, while still retaining the open instances. It can be used as a corrective action for databases that grew excessively, for example when purging was not performed at all. This should be used as a single corrective action only; the script does not replace the normal purging scripts. Truncate script: Remove all records from the SOA Suite runtime tables without dropping the tables. This script can be used for cloning SOA Suite environments without copying the instance data, or for recreating test scenarios by cleaning all the runtime data. The Oracle SOA Suite Administrator's guide contains a table with the available purging strategies. Diagnostic dumps Using WLST you could already dump diagnostic information about various components of the SOA Suite. This version adds support to retrieve more information on BPEL and Adapters from the command-line. Diagnostic dumps for BPEL New diagnostic dumps are available for BPEL to get information on thread pools, average processing time for BPEL components, and average waiting times for asynchronous instances. This information can be very useful for performance analysis or troubleshooting. With WLST this information can be retrieved from the command-line and included for monitoring or reporting. Read the full article here. SOA & BPM Partner Community For regular information on Oracle SOA Suite become a member in the SOA & BPM Partner Community for registration please visit www.oracle.com/goto/emea/soa (OPN account required) If you need support with your account please contact the Oracle Partner Business Center. Blog Twitter LinkedIn Facebook Wiki Mix Forum Technorati Tags: SOA Suite PS6,SOA Community,Oracle SOA,Oracle BPM,Community,OPN,Jürgen Kress

Read the article

What can a Service do on Windows?

- by Akemi Iwaya

If you open up Task Manager or Process Explorer on your system, you will see many services running. But how much of an impact can a service have on your system, especially if it is ‘corrupted’ by malware? Today’s SuperUser Q&A post has the answers to a curious reader’s questions. Today’s Question & Answer session comes to us courtesy of SuperUser—a subdivision of Stack Exchange, a community-driven grouping of Q&A web sites. The Question SuperUser reader Forivin wants to know how much impact a service can have on a Windows system, especially if it is ‘corrupted’ by malware: What kind malware/spyware could someone put into a service that does not have its own process on Windows? I mean services that use svchost.exe for example, like this: Could a service spy on my keyboard input? Take screenshots? Send and/or receive data over the internet? Infect other processes or files? Delete files? Kill processes? How much impact could a service have on a Windows installation? Are there any limits to what a malware ‘corrupted’ service could do? The Answer SuperUser contributor Keltari has the answer for us: What is a service? A service is an application, no more, no less. The advantage is that a service can run without a user session. This allows things like databases, backups, the ability to login, etc. to run when needed and without a user logged in. What is svchost? According to Microsoft: “svchost.exe is a generic host process name for services that run from dynamic-link libraries”. Could we have that in English please? Some time ago, Microsoft started moving all of the functionality from internal Windows services into .dll files instead of .exe files. From a programming perspective, this makes more sense for reusability…but the problem is that you can not launch a .dll file directly from Windows, it has to be loaded up from a running executable (exe). Thus the svchost.exe process was born. So, essentially a service which uses svchost is just calling a .dll and can do pretty much anything with the right credentials and/or permissions. If I remember correctly, there are viruses and other malware that do hide behind the svchost process, or name the executable svchost.exe to avoid detection. Have something to add to the explanation? Sound off in the comments. Want to read more answers from other tech-savvy Stack Exchange users? Check out the full discussion thread here.

Read the article

New Slides - and a discussion about Dictionary Statistics

- by Mike Dietrich

First of all we have just upoaded a new version of the Upgrade and Migration Workshop slides with some added information. So please feel free to download them from here.The slides have one new interesting information which lead to a discussion I've had in the past days with a very large customer regarding their upgrades - and internally on the mailing list targeting an EBS database upgrade from Oracle 10.2 to Oracle 11.2. Why are we creating dictionary statistics during upgrade? I'd believe this forced dictionary statistics creation got introduced with the desupport of the Rule Based Optimizer in Oracle 10g. The goal: as RBO is not supported anymore we have to make sure that the data dictionary has fresh and non-stale statistics. Actually that would have led in Oracle 9i to strange behaviour in some databases - so in Oracle 9i this was strongly disrecommended. The upgrade scripts got hardcoded to create these stats. But during tests we had the following findings: It's important to create dictionary statistics the night before the upgrade. Not two weeks before, not 60 minutes before your downtime begins. But very close to the upgrade. From Oracle 10g onwards you'd just say: $ execute DBMS_STATS.GATHER_DICTIONARY_STATS; This is important to make sure you have fresh dictionary statistics during upgrade for performance reasons. Tests have shown that running an upgrade without valid dictionary statistics might slow down the whole upgrade by factors of 2x-3x. And it would be also a great idea post upgrade to create again fresh dictionary statistics when you've did suppress the stats creation during the upgrade process. Suppress? Yes, you could set this underscore parameter in the init.ora: _optim_dict_stats_at_db_cr_upg=FALSE to suppress the forced dictionary statistics collection during an upgrade. We believe strongly that (a) people using the default statistics creation process which will create dictionary statistics by default and (b) create fresh stats before upgrade on the dictionary. Therefore we find it save once you have followed our advice to use the underscore during upgrade. And we've taken out that forced statistics collection during upgrade in the next release of the database. Please note: If you are using the DBUA for the upgrade it will remove underscore parameters for the upgrade run to improve performance - which is generally a good idea. So you'll have to start the DBUA with that call: $ dbua -initParam "_optim_dict_stats_at_cb_cr_upg"=FALSE -Mike

Read the article

Encrypting your SQL Server Passwords in Powershell

- by laerte

A couple of months ago, a friend of mine who is now bewitched by the seemingly supernatural abilities of Powershell (+1 for the team) asked me what, initially, appeared to be a trivial question: "Laerte, I do not have the luxury of being able to work with my SQL servers through Windows Authentication, and I need a way to automatically pass my username and password. How would you suggest I do this?" Given that I knew he, like me, was using the SQLPSX modules (an open source project created by Chad Miller; a fantastic library of reusable functions and PowerShell scripts), I merrily replied, "Simply pass the Username and Password in SQLPSX functions". He rather pointed responded: "My friend, I might as well pass: Username-'Me'-password 'NowEverybodyKnowsMyPassword'" As I do have the pleasure of working with Windows Authentication, I had not really thought this situation though yet (and thank goodness I only revealed my temporary ignorance to a friend, and the embarrassment was minimized). After discussing this puzzle with Chad Miller, he showed me some code for saving passwords on SQL Server Tables, which he had demo'd in his Powershell ETL session at Tampa SQL Saturday (and you can download the scripts from here). The solution seemed to be pretty much ready to go, so I showed it to my Authentication-impoverished friend, only to discover that we were only half-way there: "That's almost what I want, but the details need to be stored in my local txt file, together with the names of the servers that I'll actually use the Powershell scripts on. Something like: Server1,UserName,Password Server2,UserName,Password" I thought about it for just a few milliseconds (Ha! Of course I'm not telling you how long it actually took me, I have to do my own marketing, after all) and the solution was finally ready. First , we have to download Library-StringCripto (with many thanks to Steven Hystad), which is composed of two functions: One for encryption and other for decryption, both of which are used to manage the password. If you want to know more about the library, you can see more details in the help functions. Next, we have to create a txt file with your encrypted passwords:$ServerName = "Server1" $UserName = "Login1" $Password = "Senha1" $PasswordToEncrypt = "YourPassword" $UserNameEncrypt = Write-EncryptedString -inputstring $UserName -Password $PasswordToEncrypt $PasswordEncrypt = Write-EncryptedString -inputstring $Password -Password $PasswordToEncrypt "$($Servername),$($UserNameEncrypt),$($PasswordEncrypt)" | Out-File c:\temp\ServersSecurePassword.txt -Append $ServerName = "Server2" $UserName = "Login2" $Password = "senha2" $PasswordToEncrypt = "YourPassword" $UserNameEncrypt = Write-EncryptedString -inputstring $UserName -Password $PasswordToEncrypt $PasswordEncrypt = Write-EncryptedString -inputstring $Password -Password $PasswordToEncrypt "$($Servername),$($UserNameEncrypt),$($PasswordEncrypt)" | Out-File c:\temp\ ServersSecurePassword.txt -Append .And in the c:\temp\ServersSecurePassword.txt file which we've just created, you will find your Username and Password, all neatly encrypted. Let's take a look at what the txt looks like: .and in case you're wondering, Server names, Usernames and Passwords are all separated by commas. Decryption is actually much more simple:Read-EncryptedString -InputString $EncryptString -password "YourPassword" (Just remember that the Password you're trying to decrypt must be exactly the same as the encrypted phrase.) Finally, just to show you how smooth this solution is, let's say I want to use the Invoke-DBMaint function from SQLPSX to perform a checkdb on a system database: it's just a case of split, decrypt and be happy!Get-Content c:\temp\ServerSecurePassword.txt | foreach { [array] $Split = ($_).split(",") Invoke-DBMaint -server $($Split[0]) -UserName (Read-EncryptedString -InputString $Split[1] -password "YourPassword" ) -Password (Read-EncryptedString -InputString $Split[2] -password "YourPassword" ) -Databases "SYSTEM" -Action "CHECK_DB" -ReportOn c:\Temp } This is why I love Powershell.

Read the article

How Big Data and Social Won the Election

- by Mike Stiles

The story of big data’s influence on the outcome of the US Presidential election is worth a good look, because a) it’s a harbinger of things to come, and b) it’s an example of similar successes available to any enterprise seriously resourcing integrated big data, modeling, and data-driven execution on all assets, including social. Obama campaign manager Jim Messina fielded a data and analytics brain trust 5 times larger than 2008. At that time, there were numerous databases from various sources, few of them talking to each other. This time, the mission was to be metrics-centered and measure everything measurable, and in context with all the other data. Big data showed them exactly what they needed to know and told them what to do about it. It showed them women 40-49 on the west coast would donate big money if they got to eat with George Clooney. Women on the east coast would pony up to hang out with Sarah Jessica Parker. Extensive daily modeling showed them what kinds of email appeals, from who, and to whom, would prove most successful in raising cash, recruiting volunteers, and getting out the vote. Swing state voters were profiled and approached with more customized targeting that at any time in history. Ads were purchased on specific shows watched by the targets, increasing efficiency 14% over traditional media buys. For all the criticism of the candidate’s focus on appearing on comedy and entertainment shows, and local radio morning shows, that’s where the data sent them to reach the voters most likely to turn out for them. And then there was social. Again, more than in any other election, Facebook was used for virtual, highly efficient door-to-door canvasing. Facebook fans got pictures of friends in swing states and were asked to encourage them to act. Using that approach, 1 in 5 peer-to-peer appeals led to the desired action. Assumptions, gut, intuition, campaign experience, all took a backseat to strategy shifts solidly backed up by data. Zeroing in on demographics likely to back the President and tracking their mood daily literally changed the voter landscape. The Romney team watched Obama voters appear seemingly out of thin air. One Obama campaign aide said, “We ran the election 66,000 times every night.” Which brings us to your organization. If you’re starting to feel like the battle-cry of “but this is the way we’ve always done it” is starting to put you in an extremely vulnerable position, you’re right. Social has become a key communication tool of the 21st century. Failing to use it, or failing to invest in a deep understanding of who your customers and prospects are so the content you post there will achieve desired actions and results, will leave you waking up one morning wondering, “What happened?”@mikestilesPhoto stock.xchng

Read the article

Installing MOSS 2007 on Windows 2008 R2

- by Manesh Karunakaran

When you try to install MOSS 2007 on Windows 2008 R2, if you are using an installation media that is older than SP2, you would get the following error, saying that “This program is blocked due to compatibility issues” All is not lost though, all you need to do is to slip stream the SP2 updates to the MOSS 2007 Setup. Here’s a nice how to on how to do that. http://blogs.technet.com/seanearp/archive/2009/05/20/slipstreaming-sp2-into-sharepoint-server-2007.aspx Once you slipstream the SP2 updates, you would be able to continue with the installation with out the above error. HTH. You may already read from blogs about April Cumulative Update for separate components in SharePoint. Now, the server-packages (also known as “Uber” packages) of April Cumulative Update for Microsoft Office SharePoint Server 2007 and Windows SharePoint Services 3.0 are ready for download. Download Information Windows SharePoint Services 3.0 April cumulative update package http://support.microsoft.com/hotfix/KBHotfix.aspx?kbnum=968850 Office SharePoint Server 2007 April cumulative update package http://support.microsoft.com/hotfix/KBHotfix.aspx?kbnum=968851 Detail Description Description of the Windows SharePoint Services 3.0 April cumulative update package http://support.microsoft.com/kb/968850 Description of the Office SharePoint Server 2007 April cumulative update package http://support.microsoft.com/kb/968851 Installation Recommendation for a fresh SharePoint Server To keep all files in a SharePoint installation up-to-date, the following sequence is recommended. Service Pack 2 for Windows SharePoint Services 3.0 Service Pack 2 for Office SharePoint Server 2007 April Cumulative Update package for Windows SharePoint Services 3.0 April Cumulative Update package for Office SharePoint Server 2007 Please note: Start from April Cumulative Update, the packages will no longer install on a farm without a service pack installed. You must have installed either Service Pack 1 (SP1) or SP2 prior to the installation of the cumulative updates. After applying the preceding updates, run the SharePoint Products and Technologies Configuration Wizard or “psconfig –cmd upgrade –inplace b2b -wait” in command line. This needs to be done on every server in the farm with SharePoint installed. The version of content databases should be 12.0.6504.5000 after successfully applying these updates. For more in-depth guidance for the update process, we recommend that customers refer to the following articles. These articles provide a correct way to deploy updates, identify known issues (and resolutions), and provide information about creating slipstream builds. Deploy software updates for Windows SharePoint Services 3.0 http://technet.microsoft.com/en-us/library/cc288269.aspx Deploy software updates for Office SharePoint Server 2007 http://technet.microsoft.com/en-us/library/cc263467.aspx Create an installation source that includes software updates (Windows SharePoint Services 3.0) http://technet.microsoft.com/en-us/library/cc287882.aspx Create an installation source that includes software updates (Office SharePoint Server 2007) http://technet.microsoft.com/en-us/library/cc261890.aspx

Read the article

Restoring MSDB

- by David-Betteridge

We recently performed a disaster recovery exercise which included the restoration of the MSDB database onto our DR server. I did a quick google to see if there were any special considerations and found the following MS article. Considerations for Restoring the model and msdb Databases (http://msdn.microsoft.com/en-us/library/ms190749(v=sql.105).aspx). It said both the original and replacement servers must be on the same version, I double-checked and in my case they are both SQL Server 2008 R2 SP1 (10.50.2500).. So I went ahead and stopped SQL Server agent, restored the database and restarted the agent. Checked the jobs and they were all there, everything looked great, and was until the server was rebooted a few days later.Then the syspolicy_purge_history job started failing on the 3rd step with the error message “Unable to start execution of step 3 (reason: The PowerShell subsystem failed to load [see the SQLAGENT.OUT file for details]; The job has been suspended). The step failed.” A bit more googling pointed me to the msdb.dbo.syssubsystems table SELECT * FROM msdb.dbo.syssubsystems WHERE start_entry_point ='PowerShellStart' And in particular the value for the subsystem_dll. It still had the path to the SQLPOWERSHELLSS.DLL but on the old server. The DR instance has a different name to the live instance and so the paths are different. This was quickly fixed with the following SQL Use msdb; GO sp_configure 'allow updates', 1 ; RECONFIGURE WITH OVERRIDE ; GO UPDATE msdb.dbo.syssubsystems SET subsystem_dll='C:\Program Files\Microsoft SQL Server\MSSQL10_50.DR\MSSQL\binn\SQLPOWERSHELLSS.DLL' WHERE start_entry_point ='PowerShellStart'; GO sp_configure 'allow updates', 0; RECONFIGURE WITH OVERRIDE ; GO Stopped and started SQL Server agent and now the job completes. I then wondered if anything else might be broken, SELECT subsystem_dll FROM msdb.dbo.syssubsystems Shows a further 10 wrong paths – fortunately for parts of SQL (replication, SSIS etc) we aren’t using! Lessons Learnt 1. DR exercises are a good thing! 2. Keep the Live and DR environments as similar as possible.

Read the article

The most dangerous SQL Script in the world!

- by DrJohn

In my last blog entry, I outlined how to automate SQL Server database builds from concatenated SQL Scripts. However, I did not mention how I ensure the database is clean before I rebuild it. Clearly a simple DROP/CREATE DATABASE command would suffice; but you may not have permission to execute such commands, especially in a corporate environment controlled by a centralised DBA team. However, you should at least have database owner permissions on the development database so you can actually do your job! Then you can employ my universal "drop all" script which will clear down your database before you run your SQL Scripts to rebuild all the database objects. Why start with a clean database? During the development process, it is all too easy to leave old objects hanging around in the database which can have unforeseen consequences. For example, when you rename a table you may forget to delete the old table and change all the related views to use the new table. Clearly this will mean an end-user querying the views will get the wrong data and your reputation will take a nose dive as a result! Starting with a clean, empty database and then building all your database objects using SQL Scripts using the technique outlined in my previous blog means you know exactly what you have in your database. The database can then be repopulated using SSIS and bingo; you have a data mart "to go". My universal "drop all" SQL Script To ensure you start with a clean database run my universal "drop all" script which you can download from here: 100_drop_all.zip By using the database catalog views, the script finds and drops all of the following database objects: Foreign key relationships Stored procedures Triggers Database triggers Views Tables Functions Partition schemes Partition functions XML Schema Collections Schemas Types Service broker services Service broker queues Service broker contracts Service broker message types SQLCLR assemblies There are two optional sections to the script: drop users and drop roles. You may use these at your peril, particularly as you may well remove your own permissions! Note that the script has a verbose mode which displays the SQL commands it is executing. This can be switched on by setting @debug=1. Running this script against one of the system databases is certainly not recommended! So I advise you to keep a USE database statement at the top of the file. Good luck and be careful!!

Read the article

Great opportunity to try Windows Azure over the next 7 days if you are a UK developer – act to

- by Eric Nelson

Are you a UK based developer who has been put off from trying out the Windows Azure Platform? Were you concerned that you needed to hand over credit card details even to use the introductory offer? Or concerned about how many charges you might run up as you played with “elastic computing”. Then we might have just what you need. 7 Days of access to the Windows Azure Platform – for FREE (expires June 6th 2010) If you are accepted, you will be given a Windows Azure Platfom subscription that will enable you to create Windows Azure hosted services and storage accounts, SQL Azure databases and AppFabric services without any fear of being charged between now and Sunday the 6th of June 2010. No credit card is required. Important: At the end of Sunday your subscription and all your code and data you have uploaded will be deleted. It is your responsibility to keep local copies of your code and data. Apply now To apply for this offer you need to: email ukdev AT microsoft.com with a subject line that starts “UKAZURETRAIL:” (This must be present) In the email you need to demonstrate you are UK based (.uk email alias or address or… be creative) And you must include 30 to 100 words explaining What your interest is in the Windows Azure Platform and Cloud Computing What you would use the 7 days to explore Some notes (please read!): We have a limited number of these offers to give away on a first come, first served basis (subject to meeting the above criteria). We plan to process all request asap – but there is a UK bank holiday weekend looming. We will do our best to process all by Tues afternoon (which would still give you 5 days of access) There will be no specific support for this offer. We will not be processing any requests that arrive after Tuesday 1st. In case you were wondering, there is no equivalent offer for developer outside of the UK. This offer is a direct result of UK based training we are currently doing which has some spare Azure capacity which we wanted to make best use of. Sorry in advance if you based outside of the UK. Related Links: If you are UK based, you should also join the UK Windows Azure Platform community http://ukazure.ning.com Microsoft UK Windows Azure Platform page

Read the article

Using R to Analyze G1GC Log Files

- by user12620111

Using R to Analyze G1GC Log Files body, td { font-family: sans-serif; background-color: white; font-size: 12px; margin: 8px; } tt, code, pre { font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace; } h1 { font-size:2.2em; } h2 { font-size:1.8em; } h3 { font-size:1.4em; } h4 { font-size:1.0em; } h5 { font-size:0.9em; } h6 { font-size:0.8em; } a:visited { color: rgb(50%, 0%, 50%); } pre { margin-top: 0; max-width: 95%; border: 1px solid #ccc; white-space: pre-wrap; } pre code { display: block; padding: 0.5em; } code.r, code.cpp { background-color: #F8F8F8; } table, td, th { border: none; } blockquote { color:#666666; margin:0; padding-left: 1em; border-left: 0.5em #EEE solid; } hr { height: 0px; border-bottom: none; border-top-width: thin; border-top-style: dotted; border-top-color: #999999; } @media print { * { background: transparent !important; color: black !important; filter:none !important; -ms-filter: none !important; } body { font-size:12pt; max-width:100%; } a, a:visited { text-decoration: underline; } hr { visibility: hidden; page-break-before: always; } pre, blockquote { padding-right: 1em; page-break-inside: avoid; } tr, img { page-break-inside: avoid; } img { max-width: 100% !important; } @page :left { margin: 15mm 20mm 15mm 10mm; } @page :right { margin: 15mm 10mm 15mm 20mm; } p, h2, h3 { orphans: 3; widows: 3; } h2, h3 { page-break-after: avoid; } } pre .operator, pre .paren { color: rgb(104, 118, 135) } pre .literal { color: rgb(88, 72, 246) } pre .number { color: rgb(0, 0, 205); } pre .comment { color: rgb(76, 136, 107); } pre .keyword { color: rgb(0, 0, 255); } pre .identifier { color: rgb(0, 0, 0); } pre .string { color: rgb(3, 106, 7); } var hljs=new function(){function m(p){return p.replace(/&/gm,"&").replace(/"}while(y.length||w.length){var v=u().splice(0,1)[0];z+=m(x.substr(q,v.offset-q));q=v.offset;if(v.event=="start"){z+=t(v.node);s.push(v.node)}else{if(v.event=="stop"){var p,r=s.length;do{r--;p=s[r];z+=("")}while(p!=v.node);s.splice(r,1);while(r'+M[0]+""}else{r+=M[0]}O=P.lR.lastIndex;M=P.lR.exec(L)}return r+L.substr(O,L.length-O)}function J(L,M){if(M.sL&&e[M.sL]){var r=d(M.sL,L);x+=r.keyword_count;return r.value}else{return F(L,M)}}function I(M,r){var L=M.cN?'':"";if(M.rB){y+=L;M.buffer=""}else{if(M.eB){y+=m(r)+L;M.buffer=""}else{y+=L;M.buffer=r}}D.push(M);A+=M.r}function G(N,M,Q){var R=D[D.length-1];if(Q){y+=J(R.buffer+N,R);return false}var P=q(M,R);if(P){y+=J(R.buffer+N,R);I(P,M);return P.rB}var L=v(D.length-1,M);if(L){var O=R.cN?"":"";if(R.rE){y+=J(R.buffer+N,R)+O}else{if(R.eE){y+=J(R.buffer+N,R)+O+m(M)}else{y+=J(R.buffer+N+M,R)+O}}while(L1){O=D[D.length-2].cN?"":"";y+=O;L--;D.length--}var r=D[D.length-1];D.length--;D[D.length-1].buffer="";if(r.starts){I(r.starts,"")}return R.rE}if(w(M,R)){throw"Illegal"}}var E=e[B];var D=[E.dM];var A=0;var x=0;var y="";try{var s,u=0;E.dM.buffer="";do{s=p(C,u);var t=G(s[0],s[1],s[2]);u+=s[0].length;if(!t){u+=s[1].length}}while(!s[2]);if(D.length1){throw"Illegal"}return{r:A,keyword_count:x,value:y}}catch(H){if(H=="Illegal"){return{r:0,keyword_count:0,value:m(C)}}else{throw H}}}function g(t){var p={keyword_count:0,r:0,value:m(t)};var r=p;for(var q in e){if(!e.hasOwnProperty(q)){continue}var s=d(q,t);s.language=q;if(s.keyword_count+s.rr.keyword_count+r.r){r=s}if(s.keyword_count+s.rp.keyword_count+p.r){r=p;p=s}}if(r.language){p.second_best=r}return p}function i(r,q,p){if(q){r=r.replace(/^((]+|\t)+)/gm,function(t,w,v,u){return w.replace(/\t/g,q)})}if(p){r=r.replace(/\n/g,"")}return r}function n(t,w,r){var x=h(t,r);var v=a(t);var y,s;if(v){y=d(v,x)}else{return}var q=c(t);if(q.length){s=document.createElement("pre");s.innerHTML=y.value;y.value=k(q,c(s),x)}y.value=i(y.value,w,r);var u=t.className;if(!u.match("(\\s|^)(language-)?"+v+"(\\s|$)")){u=u?(u+" "+v):v}if(/MSIE [678]/.test(navigator.userAgent)&&t.tagName=="CODE"&&t.parentNode.tagName=="PRE"){s=t.parentNode;var p=document.createElement("div");p.innerHTML=""+y.value+"";t=p.firstChild.firstChild;p.firstChild.cN=s.cN;s.parentNode.replaceChild(p.firstChild,s)}else{t.innerHTML=y.value}t.className=u;t.result={language:v,kw:y.keyword_count,re:y.r};if(y.second_best){t.second_best={language:y.second_best.language,kw:y.second_best.keyword_count,re:y.second_best.r}}}function o(){if(o.called){return}o.called=true;var r=document.getElementsByTagName("pre");for(var p=0;p|=||=||=|\\?|\\[|\\{|\\(|\\^|\\^=|\\||\\|=|\\|\\||~";this.ER="(?![\\s\\S])";this.BE={b:"\\\\.",r:0};this.ASM={cN:"string",b:"'",e:"'",i:"\\n",c:[this.BE],r:0};this.QSM={cN:"string",b:'"',e:'"',i:"\\n",c:[this.BE],r:0};this.CLCM={cN:"comment",b:"//",e:"$"};this.CBLCLM={cN:"comment",b:"/\\*",e:"\\*/"};this.HCM={cN:"comment",b:"#",e:"$"};this.NM={cN:"number",b:this.NR,r:0};this.CNM={cN:"number",b:this.CNR,r:0};this.BNM={cN:"number",b:this.BNR,r:0};this.inherit=function(r,s){var p={};for(var q in r){p[q]=r[q]}if(s){for(var q in s){p[q]=s[q]}}return p}}();hljs.LANGUAGES.cpp=function(){var a={keyword:{"false":1,"int":1,"float":1,"while":1,"private":1,"char":1,"catch":1,"export":1,virtual:1,operator:2,sizeof:2,dynamic_cast:2,typedef:2,const_cast:2,"const":1,struct:1,"for":1,static_cast:2,union:1,namespace:1,unsigned:1,"long":1,"throw":1,"volatile":2,"static":1,"protected":1,bool:1,template:1,mutable:1,"if":1,"public":1,friend:2,"do":1,"return":1,"goto":1,auto:1,"void":2,"enum":1,"else":1,"break":1,"new":1,extern:1,using:1,"true":1,"class":1,asm:1,"case":1,typeid:1,"short":1,reinterpret_cast:2,"default":1,"double":1,register:1,explicit:1,signed:1,typename:1,"try":1,"this":1,"switch":1,"continue":1,wchar_t:1,inline:1,"delete":1,alignof:1,char16_t:1,char32_t:1,constexpr:1,decltype:1,noexcept:1,nullptr:1,static_assert:1,thread_local:1,restrict:1,_Bool:1,complex:1},built_in:{std:1,string:1,cin:1,cout:1,cerr:1,clog:1,stringstream:1,istringstream:1,ostringstream:1,auto_ptr:1,deque:1,list:1,queue:1,stack:1,vector:1,map:1,set:1,bitset:1,multiset:1,multimap:1,unordered_set:1,unordered_map:1,unordered_multiset:1,unordered_multimap:1,array:1,shared_ptr:1}};return{dM:{k:a,i:"",k:a,r:10,c:["self"]}]}}}();hljs.LANGUAGES.r={dM:{c:[hljs.HCM,{cN:"number",b:"\\b0[xX][0-9a-fA-F]+[Li]?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+(?:[eE][+\\-]?\\d*)?L\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+\\.(?!\\d)(?:i\\b)?",e:hljs.IMMEDIATE_RE,r:1},{cN:"number",b:"\\b\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"keyword",b:"(?:tryCatch|library|setGeneric|setGroupGeneric)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\.",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\d+(?![\\w.])",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\b(?:function)",e:hljs.IMMEDIATE_RE,r:2},{cN:"keyword",b:"(?:if|in|break|next|repeat|else|for|return|switch|while|try|stop|warning|require|attach|detach|source|setMethod|setClass)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"literal",b:"(?:NA|NA_integer_|NA_real_|NA_character_|NA_complex_)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"literal",b:"(?:NULL|TRUE|FALSE|T|F|Inf|NaN)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"identifier",b:"[a-zA-Z.][a-zA-Z0-9._]*\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"operator",b:"|=|| Using R to Analyze G1GC Log Files Using R to Analyze G1GC Log Files Introduction Working in Oracle Platform Integration gives an engineer opportunities to work on a wide array of technologies. My team’s goal is to make Oracle applications run best on the Solaris/SPARC platform. When looking for bottlenecks in a modern applications, one needs to be aware of not only how the CPUs and operating system are executing, but also network, storage, and in some cases, the Java Virtual Machine. I was recently presented with about 1.5 GB of Java Garbage First Garbage Collector log file data. If you’re not familiar with the subject, you might want to review Garbage First Garbage Collector Tuning by Monica Beckwith. The customer had been running Java HotSpot 1.6.0_31 to host a web application server. I was told that the Solaris/SPARC server was running a Java process launched using a commmand line that included the following flags: -d64 -Xms9g -Xmx9g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintFlagsFinal -XX:+DisableExplicitGC -XX:+UnlockExperimentalVMOptions -XX:ParallelGCThreads=8 Several sources on the internet indicate that if I were to print out the 1.5 GB of log files, it would require enough paper to fill the bed of a pick up truck. Of course, it would be fruitless to try to scan the log files by hand. Tools will be required to summarize the contents of the log files. Others have encountered large Java garbage collection log files. There are existing tools to analyze the log files: IBM’s GC toolkit The chewiebug GCViewer gchisto HPjmeter Instead of using one of the other tools listed, I decide to parse the log files with standard Unix tools, and analyze the data with R. Data Cleansing The log files arrived in two different formats. I guess that the difference is that one set of log files was generated using a more verbose option, maybe -XX:+PrintHeapAtGC, and the other set of log files was generated without that option. Format 1 In some of the log files, the log files with the less verbose format, a single trace, i.e. the report of a singe garbage collection event, looks like this: {Heap before GC invocations=12280 (full 61): garbage-first heap total 9437184K, used 7499918K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 1 young (4096K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. 2014-05-14T07:24:00.988-0700: 60586.353: [GC pause (young) 7324M->7320M(9216M), 0.1567265 secs] Heap after GC invocations=12281 (full 61): garbage-first heap total 9437184K, used 7496533K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 0 young (0K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. } A simple grep can be used to extract a summary: $ grep "\[ GC pause (young" g1gc.log 2014-05-13T13:24:35.091-0700: 3.109: [GC pause (young) 20M->5029K(9216M), 0.0146328 secs] 2014-05-13T13:24:35.440-0700: 3.459: [GC pause (young) 9125K->6077K(9216M), 0.0086723 secs] 2014-05-13T13:24:37.581-0700: 5.599: [GC pause (young) 25M->8470K(9216M), 0.0203820 secs] 2014-05-13T13:24:42.686-0700: 10.704: [GC pause (young) 44M->15M(9216M), 0.0288848 secs] 2014-05-13T13:24:48.941-0700: 16.958: [GC pause (young) 51M->20M(9216M), 0.0491244 secs] 2014-05-13T13:24:56.049-0700: 24.066: [GC pause (young) 92M->26M(9216M), 0.0525368 secs] 2014-05-13T13:25:34.368-0700: 62.383: [GC pause (young) 602M->68M(9216M), 0.1721173 secs] But that format wasn't easily read into R, so I needed to be a bit more tricky. I used the following Unix command to create a summary file that was easy for R to read. $ echo "SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime" $ grep "\[GC pause (young" g1gc.log | grep -v mark | sed -e 's/[A-SU-z,]/ /g' -e 's/->/ /' -e 's/: / /g' | more SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime 2014-05-13T13:24:35.091-0700 3.109 20 5029 9216 0.0146328 2014-05-13T13:24:35.440-0700 3.459 9125 6077 9216 0.0086723 2014-05-13T13:24:37.581-0700 5.599 25 8470 9216 0.0203820 2014-05-13T13:24:42.686-0700 10.704 44 15 9216 0.0288848 2014-05-13T13:24:48.941-0700 16.958 51 20 9216 0.0491244 2014-05-13T13:24:56.049-0700 24.066 92 26 9216 0.0525368 2014-05-13T13:25:34.368-0700 62.383 602 68 9216 0.1721173 Format 2 In some of the log files, the log files with the more verbose format, a single trace, i.e. the report of a singe garbage collection event, was more complicated than Format 1. Here is a text file with an example of a single G1GC trace in the second format. As you can see, it is quite complicated. It is nice that there is so much information available, but the level of detail can be overwhelming. I wrote this awk script (download) to summarize each trace on a single line. #!/usr/bin/env awk -f BEGIN { printf("SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize\n") } ###################### # Save count data from lines that are at the start of each G1GC trace. # Each trace starts out like this: # {Heap before GC invocations=14 (full 0): # garbage-first heap total 9437184K, used 325496K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) ###################### /{Heap.*full/{ gsub ( "\\)" , "" ); nf=split($0,a,"="); split(a[2],b," "); getline; if ( match($0, "first") ) { G1GC=1; IncrementalCount=b[1]; FullCount=substr( b[3], 1, length(b[3])-1 ); } else { G1GC=0; } } ###################### # Pull out time stamps that are in lines with this format: # 2014-05-12T14:02:06.025-0700: 94.312: [GC pause (young), 0.08870154 secs] ###################### /GC pause/ { DateTime=$1; SecondsSinceLaunch=substr($2, 1, length($2)-1); } ###################### # Heap sizes are in lines that look like this: # [ 4842M->4838M(9216M)] ###################### /\[ .*]$/ { gsub ( "\\[" , "" ); gsub ( "\ \]" , "" ); gsub ( "->" , " " ); gsub ( "\$ " , " " ); gsub ( "\ $" , " " ); split($0,a," "); if ( split(a[1],b,"M") > 1 ) {BeforeSize=b[1]*1024;} if ( split(a[1],b,"K") > 1 ) {BeforeSize=b[1];} if ( split(a[2],b,"M") > 1 ) {AfterSize=b[1]*1024;} if ( split(a[2],b,"K") > 1 ) {AfterSize=b[1];} if ( split(a[3],b,"M") > 1 ) {TotalSize=b[1]*1024;} if ( split(a[3],b,"K") > 1 ) {TotalSize=b[1];} } ###################### # Emit an output line when you find input that looks like this: # [Times: user=1.41 sys=0.08, real=0.24 secs] ###################### /\[Times/ { if (G1GC==1) { gsub ( "," , "" ); split($2,a,"="); UserTime=a[2]; split($3,a,"="); SysTime=a[2]; split($4,a,"="); RealTime=a[2]; print DateTime,SecondsSinceLaunch,IncrementalCount,FullCount,UserTime,SysTime,RealTime,BeforeSize,AfterSize,TotalSize; G1GC=0; } } The resulting summary is about 25X smaller that the original file, but still difficult for a human to digest. SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ... 2014-05-12T18:36:34.669-0700: 3985.744 561 0 0.57 0.06 0.16 1724416 1720320 9437184 2014-05-12T18:36:34.839-0700: 3985.914 562 0 0.51 0.06 0.19 1724416 1720320 9437184 2014-05-12T18:36:35.069-0700: 3986.144 563 0 0.60 0.04 0.27 1724416 1721344 9437184 2014-05-12T18:36:35.354-0700: 3986.429 564 0 0.33 0.04 0.09 1725440 1722368 9437184 2014-05-12T18:36:35.545-0700: 3986.620 565 0 0.58 0.04 0.17 1726464 1722368 9437184 2014-05-12T18:36:35.726-0700: 3986.801 566 0 0.43 0.05 0.12 1726464 1722368 9437184 2014-05-12T18:36:35.856-0700: 3986.930 567 0 0.30 0.04 0.07 1726464 1723392 9437184 2014-05-12T18:36:35.947-0700: 3987.023 568 0 0.61 0.04 0.26 1727488 1723392 9437184 2014-05-12T18:36:36.228-0700: 3987.302 569 0 0.46 0.04 0.16 1731584 1724416 9437184 Reading the Data into R Once the GC log data had been cleansed, either by processing the first format with the shell script, or by processing the second format with the awk script, it was easy to read the data into R. g1gc.df = read.csv("summary.txt", row.names = NULL, stringsAsFactors=FALSE,sep="") str(g1gc.df) ## 'data.frame': 8307 obs. of 10 variables: ## $ row.names : chr "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ... ## $ SecondsSinceLaunch: num 1.16 1.47 1.97 3.83 6.1 ... ## $ IncrementalCount : int 0 1 2 3 4 5 6 7 8 9 ... ## $ FullCount : int 0 0 0 0 0 0 0 0 0 0 ... ## $ UserTime : num 0.11 0.05 0.04 0.21 0.08 0.26 0.31 0.33 0.34 0.56 ... ## $ SysTime : num 0.04 0.01 0.01 0.05 0.01 0.06 0.07 0.06 0.07 0.09 ... ## $ RealTime : num 0.02 0.02 0.01 0.04 0.02 0.04 0.05 0.04 0.04 0.06 ... ## $ BeforeSize : int 8192 5496 5768 22528 24576 43008 34816 53248 55296 93184 ... ## $ AfterSize : int 1400 1672 2557 4907 7072 14336 16384 18432 19456 21504 ... ## $ TotalSize : int 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 ... head(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount ## 1 2014-05-12T14:00:32.868-0700: 1.161 0 ## 2 2014-05-12T14:00:33.179-0700: 1.472 1 ## 3 2014-05-12T14:00:33.677-0700: 1.969 2 ## 4 2014-05-12T14:00:35.538-0700: 3.830 3 ## 5 2014-05-12T14:00:37.811-0700: 6.103 4 ## 6 2014-05-12T14:00:41.428-0700: 9.720 5 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 1 0 0.11 0.04 0.02 8192 1400 9437184 ## 2 0 0.05 0.01 0.02 5496 1672 9437184 ## 3 0 0.04 0.01 0.01 5768 2557 9437184 ## 4 0 0.21 0.05 0.04 22528 4907 9437184 ## 5 0 0.08 0.01 0.02 24576 7072 9437184 ## 6 0 0.26 0.06 0.04 43008 14336 9437184 Basic Statistics Once the data has been read into R, simple statistics are very easy to generate. All of the numbers from high school statistics are available via simple commands. For example, generate a summary of every column: summary(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount FullCount ## Length:8307 Min. : 1 Min. : 0 Min. : 0.0 ## Class :character 1st Qu.: 9977 1st Qu.:2048 1st Qu.: 0.0 ## Mode :character Median :12855 Median :4136 Median : 12.0 ## Mean :12527 Mean :4156 Mean : 31.6 ## 3rd Qu.:15758 3rd Qu.:6262 3rd Qu.: 61.0 ## Max. :55484 Max. :8391 Max. :113.0 ## UserTime SysTime RealTime BeforeSize ## Min. :0.040 Min. :0.0000 Min. : 0.0 Min. : 5476 ## 1st Qu.:0.470 1st Qu.:0.0300 1st Qu.: 0.1 1st Qu.:5137920 ## Median :0.620 Median :0.0300 Median : 0.1 Median :6574080 ## Mean :0.751 Mean :0.0355 Mean : 0.3 Mean :5841855 ## 3rd Qu.:0.920 3rd Qu.:0.0400 3rd Qu.: 0.2 3rd Qu.:7084032 ## Max. :3.370 Max. :1.5600 Max. :488.1 Max. :8696832 ## AfterSize TotalSize ## Min. : 1380 Min. :9437184 ## 1st Qu.:5002752 1st Qu.:9437184 ## Median :6559744 Median :9437184 ## Mean :5785454 Mean :9437184 ## 3rd Qu.:7054336 3rd Qu.:9437184 ## Max. :8482816 Max. :9437184 Q: What is the total amount of User CPU time spent in garbage collection? sum(g1gc.df$UserTime) ## [1] 6236 As you can see, less than two hours of CPU time was spent in garbage collection. Is that too much? To find the percentage of time spent in garbage collection, divide the number above by total_elapsed_time*CPU_count. In this case, there are a lot of CPU’s and it turns out the the overall amount of CPU time spent in garbage collection isn’t a problem when viewed in isolation. When calculating rates, i.e. events per unit time, you need to ask yourself if the rate is homogenous across the time period in the log file. Does the log file include spikes of high activity that should be separately analyzed? Averaging in data from nights and weekends with data from business hours may alias problems. If you have a reason to suspect that the garbage collection rates include peaks and valleys that need independent analysis, see the “Time Series” section, below. Q: How much garbage is collected on each pass? The amount of heap space that is recovered per GC pass is surprisingly low: At least one collection didn’t recover any data. (“Min.=0”) 25% of the passes recovered 3MB or less. (“1st Qu.=3072”) Half of the GC passes recovered 4MB or less. (“Median=4096”) The average amount recovered was 56MB. (“Mean=56390”) 75% of the passes recovered 36MB or less. (“3rd Qu.=36860”) At least one pass recovered 2GB. (“Max.=2121000”) g1gc.df$Delta = g1gc.df$BeforeSize - g1gc.df$AfterSize summary(g1gc.df$Delta) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0 3070 4100 56400 36900 2120000 Q: What is the maximum User CPU time for a single collection? The worst garbage collection (“Max.”) is many standard deviations away from the mean. The data appears to be right skewed. summary(g1gc.df$UserTime) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.040 0.470 0.620 0.751 0.920 3.370 sd(g1gc.df$UserTime) ## [1] 0.3966 Basic Graphics Once the data is in R, it is trivial to plot the data with formats including dot plots, line charts, bar charts (simple, stacked, grouped), pie charts, boxplots, scatter plots histograms, and kernel density plots. Histogram of User CPU Time per Collection I don't think that this graph requires any explanation. hist(g1gc.df$UserTime, main="User CPU Time per Collection", xlab="Seconds", ylab="Frequency") Box plot to identify outliers When the initial data is viewed with a box plot, you can see the one crazy outlier in the real time per GC. Save this data point for future analysis and drop the outlier so that it’s not throwing off our statistics. Now the box plot shows many outliers, which will be examined later, using times series analysis. Notice that the scale of the x-axis changes drastically once the crazy outlier is removed. par(mfrow=c(2,1)) boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(dominated by a crazy outlier)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") crazy.outlier.df=g1gc.df[g1gc.df$RealTime > 400,] g1gc.df=g1gc.df[g1gc.df$RealTime < 400,] boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(crazy outlier excluded)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") box(which = "outer", lty = "solid") Here is the crazy outlier for future analysis: crazy.outlier.df ## row.names SecondsSinceLaunch IncrementalCount ## 8233 2014-05-12T23:15:43.903-0700: 20741 8316 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 8233 112 0.55 0.42 488.1 8381440 8235008 9437184 ## Delta ## 8233 146432 R Time Series Data To analyze the garbage collection as a time series, I’ll use Z’s Ordered Observations (zoo). “zoo is the creator for an S3 class of indexed totally ordered observations which includes irregular time series.” require(zoo) ## Loading required package: zoo ## ## Attaching package: 'zoo' ## ## The following objects are masked from 'package:base': ## ## as.Date, as.Date.numeric head(g1gc.df[,1]) ## [1] "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" ## [3] "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ## [5] "2014-05-12T14:00:37.811-0700:" "2014-05-12T14:00:41.428-0700:" options("digits.secs"=3) times=as.POSIXct( g1gc.df[,1], format="%Y-%m-%dT%H:%M:%OS%z:") g1gc.z = zoo(g1gc.df[,-c(1)], order.by=times) head(g1gc.z) ## SecondsSinceLaunch IncrementalCount FullCount ## 2014-05-12 17:00:32.868 1.161 0 0 ## 2014-05-12 17:00:33.178 1.472 1 0 ## 2014-05-12 17:00:33.677 1.969 2 0 ## 2014-05-12 17:00:35.538 3.830 3 0 ## 2014-05-12 17:00:37.811 6.103 4 0 ## 2014-05-12 17:00:41.427 9.720 5 0 ## UserTime SysTime RealTime BeforeSize AfterSize ## 2014-05-12 17:00:32.868 0.11 0.04 0.02 8192 1400 ## 2014-05-12 17:00:33.178 0.05 0.01 0.02 5496 1672 ## 2014-05-12 17:00:33.677 0.04 0.01 0.01 5768 2557 ## 2014-05-12 17:00:35.538 0.21 0.05 0.04 22528 4907 ## 2014-05-12 17:00:37.811 0.08 0.01 0.02 24576 7072 ## 2014-05-12 17:00:41.427 0.26 0.06 0.04 43008 14336 ## TotalSize Delta ## 2014-05-12 17:00:32.868 9437184 6792 ## 2014-05-12 17:00:33.178 9437184 3824 ## 2014-05-12 17:00:33.677 9437184 3211 ## 2014-05-12 17:00:35.538 9437184 17621 ## 2014-05-12 17:00:37.811 9437184 17504 ## 2014-05-12 17:00:41.427 9437184 28672 Example of Two Benchmark Runs in One Log File The data in the following graph is from a different log file, not the one of primary interest to this article. I’m including this image because it is an example of idle periods followed by busy periods. It would be uninteresting to average the rate of garbage collection over the entire log file period. More interesting would be the rate of garbage collect in the two busy periods. Are they the same or different? Your production data may be similar, for example, bursts when employees return from lunch and idle times on weekend evenings, etc. Once the data is in an R Time Series, you can analyze isolated time windows. Clipping the Time Series data Flashing back to our test case… Viewing the data as a time series is interesting. You can see that the work intensive time period is between 9:00 PM and 3:00 AM. Lets clip the data to the interesting period: par(mfrow=c(2,1)) plot(g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Complete Log File", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") clipped.g1gc.z=window(g1gc.z, start=as.POSIXct("2014-05-12 21:00:00"), end=as.POSIXct("2014-05-13 03:00:00")) plot(clipped.g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Limited to Benchmark Execution", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") box(which = "outer", lty = "solid") Cumulative Incremental and Full GC count Here is the cumulative incremental and full GC count. When the line is very steep, it indicates that the GCs are repeating very quickly. Notice that the scale on the Y axis is different for full vs. incremental. plot(clipped.g1gc.z[,c(2:3)], main="Cumulative Incremental and Full GC count", xlab="Time of Day", col="#1b9e77") GC Analysis of Benchmark Execution using Time Series data In the following series of 3 graphs: The “After Size” show the amount of heap space in use after each garbage collection. Many Java objects are still referenced, i.e. alive, during each garbage collection. This may indicate that the application has a memory leak, or may indicate that the application has a very large memory footprint. Typically, an application's memory footprint plateau's in the early stage of execution. One would expect this graph to have a flat top. The steep decline in the heap space may indicate that the application crashed after 2:00. The second graph shows that the outliers in real execution time, discussed above, occur near 2:00. when the Java heap seems to be quite full. The third graph shows that Full GCs are infrequent during the first few hours of execution. The rate of Full GC's, (the slope of the cummulative Full GC line), changes near midnight. plot(clipped.g1gc.z[,c("AfterSize","RealTime","FullCount")], xlab="Time of Day", col=c("#1b9e77","red","#1b9e77")) GC Analysis of heap recovered Each GC trace includes the amount of heap space in use before and after the individual GC event. During garbage coolection, unreferenced objects are identified, the space holding the unreferenced objects is freed, and thus, the difference in before and after usage indicates how much space has been freed. The following box plot and bar chart both demonstrate the same point - the amount of heap space freed per garbage colloection is surprisingly low. par(mfrow=c(2,1)) boxplot(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", horizontal = TRUE, col="red") hist(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", breaks=100, col="red") box(which = "outer", lty = "solid") This graph is the most interesting. The dark blue area shows how much heap is occupied by referenced Java objects. This represents memory that holds live data. The red fringe at the top shows how much data was recovered after each garbage collection. barplot(clipped.g1gc.z[,c("AfterSize","Delta")], col=c("#7570b3","#e7298a"), xlab="Time of Day", border=NA) legend("topleft", c("Live Objects","Heap Recovered on GC"), fill=c("#7570b3","#e7298a")) box(which = "outer", lty = "solid") When I discuss the data in the log files with the customer, I will ask for an explaination for the large amount of referenced data resident in the Java heap. There are two are posibilities: There is a memory leak and the amount of space required to hold referenced objects will continue to grow, limited only by the maximum heap size. After the maximum heap size is reached, the JVM will throw an “Out of Memory” exception every time that the application tries to allocate a new object. If this is the case, the aplication needs to be debugged to identify why old objects are referenced when they are no longer needed. The application has a legitimate requirement to keep a large amount of data in memory. The customer may want to further increase the maximum heap size. Another possible solution would be to partition the application across multiple cluster nodes, where each node has responsibility for managing a unique subset of the data. Conclusion In conclusion, R is a very powerful tool for the analysis of Java garbage collection log files. The primary difficulty is data cleansing so that information can be read into an R data frame. Once the data has been read into R, a rich set of tools may be used for thorough evaluation.

Read the article

How to suggest using an ORM instead of stored procedures?

- by Wayne M

I work at a company that only uses stored procedures for all data access, which makes it very annoying to keep our local databases in sync as every commit we have to run new procs. I have used some basic ORMs in the past and I find the experience much better and cleaner. I'd like to suggest to the development manager and rest of the team that we look into using an ORM Of some kind for future development (the rest of the team are only familiar with stored procedures and have never used anything else). The current architecture is .NET 3.5 written like .NET 1.1, with "god classes" that use a strange implementation of ActiveRecord and return untyped DataSets which are looped over in code-behind files - the classes work something like this: class Foo { public bool LoadFoo() { bool blnResult = false; if (this.FooID == 0) { throw new Exception("FooID must be set before calling this method."); } DataSet ds = // ... call to Sproc if (ds.Tables[0].Rows.Count > 0) { foo.FooName = ds.Tables[0].Rows[0]["FooName"].ToString(); // other properties set blnResult = true; } return blnResult; } } // Consumer Foo foo = new Foo(); foo.FooID = 1234; foo.LoadFoo(); // do stuff with foo... There is pretty much no application of any design patterns. There are no tests whatsoever (nobody else knows how to write unit tests, and testing is done through manually loading up the website and poking around). Looking through our database we have: 199 tables, 13 views, a whopping 926 stored procedures and 93 functions. About 30 or so tables are used for batch jobs or external things, the remainder are used in our core application. Is it even worth pursuing a different approach in this scenario? I'm talking about moving forward only since we aren't allowed to refactor the existing code since "it works" so we cannot change the existing classes to use an ORM, but I don't know how often we add brand new modules instead of adding to/fixing current modules so I'm not sure if an ORM is the right approach (too much invested in stored procedures and DataSets). If it is the right choice, how should I present the case for using one? Off the top of my head the only benefits I can think of is having cleaner code (although it might not be, since the current architecture isn't built with ORMs in mind so we would basically be jury-rigging ORMs on to future modules but the old ones would still be using the DataSets) and less hassle to have to remember what procedure scripts have been run and which need to be run, etc. but that's it, and I don't know how compelling an argument that would be. Maintainability is another concern but one that nobody except me seems to be concerned about.

Read the article

Database unit testing is now available for SSDT

- by jamiet

Good news was announced yesterday for those that are using SSDT and want to write unit tests, unit testing functionality is now available. The announcement was made on the SSDT team blog in post Available Today: SSDT—December 2012. Here are a few thoughts about this news. Firstly, there seems to be a general impression that database unit testing was not previously available for SSDT – that’s not entirely true. Database unit testing was most recently delivered in Visual Studio 2010 and any database unit tests written therein work perfectly well against SQL Server databases created using SSDT (why wouldn’t they – its just a database after all). In other words, if you’re running SSDT inside Visual Studio 2010 then you could carry on freely writing database unit tests; some of the tight integration between the two (e.g. right-click on an object in SQL Server Object Explorer and choose to create a unit test) was not there – but I’ve never found that to be a problem. I am currently working on a project that uses SSDT for database development and have been happily running VS2010 database unit tests for a few months now. All that being said, delivery of database unit testing for SSDT is now with us and that is good news, not least because we now have the ability to create unit tests in VS2012. We also get tight integration with SSDT itself, the like of which I mentioned above. Having now had a look at the new features I was delighted to find that one of my big complaints about database unit testing has been solved. As I reported here on Connect a refactor operation would cause unit test code to get completely mangled. See here the before and after from such an operation: SELECT * FROM bi.ProcessMessageLog pml INNER JOIN bi.[LogMessageType] lmt ON pml.[LogMessageTypeId] = lmt.[LogMessageTypeId] WHERE pml.[LogMessage] = 'Ski[LogMessageTypeName]of message: IApplicationCanceled' AND lmt.[LogMessageType] = 'Warning'; which is obviously not ideal. Thankfully that seems to have been solved with this latest release. One disappointment about this new release is that the process for running tests as part of a CI build has not changed from the horrendously complicated process required previously. Check out my blog post Setting up database unit testing as part of a Continuous Integration build process [VS2010 DB Tools - Datadude] for instructions on how to do it. In that blog post I describe it as “fiddly” – I was being kind when I said that! @Jamiet

Search Results

Search found 17194 results on 688 pages for 'document databases'.

Page 398/688 | < Previous Page | 394 395 396 397 398 399 400 401 402 403 404 405 | Next Page >

- by Marco Russo (SQLBI)

- by Mike Dietrich

- by Dinesh Asanka

- by jeffreyabecker

- by Sandeepan Nath

- by Irem Radzik

- by Pinal Dave

- by KLaker

- by Mladen Prajdic

- by csoto

- by mandy.ho

- by Antoinette O'Sullivan

- by Herve Roggero

- by JuergenKress

- by Akemi Iwaya

- by Mike Dietrich

- by laerte

- by Mike Stiles

- by Manesh Karunakaran

- by David-Betteridge

- by DrJohn

- by Eric Nelson

- by user12620111

- by Wayne M

- by jamiet

< Previous Page | 394 395 396 397 398 399 400 401 402 403 404 405 | Next Page >