The SPARC T microprocessor,
released in 2005 by Sun Microsystems, and now continued at Oracle,
has a good track record in parallel execution and multi-threaded performance. However it was less suited for pure single-threaded workloads. The new SPARC T4 processor is now filling that gap by
offering a 5x better single-thread performance over previous
generations. 
  Following our long-term
relationship with Talend, a fast growing ISV positioned by Gartner in
 the “Visionaries” quadrant of the “Magic Quadrant for Data
Integration Tools”, we decided to test some of their integration
components with the T4 chip, more precisely on a T4-1 system, in
order to verify first hand if this new processor stands up to its
promises. 	  
   Several tests were performed,
mainly focused on:  
   
    Single-thread performance of
	the new SPARC T4 processor compared to an older SPARC
	T2+ processor
     
    Overall throughput of the
	SPARC T4-1 server using multiple threads
     
   
   The tests consisted in reading
large amounts of data --ten's of gigabytes--, processing and writing
them back to a file or an Oracle 11gR2 database table. They are CPU,
memory and IO bound tests. Given the main focus of this project --CPU
performance--, bottlenecks were removed as much as possible on the memory
and IO sub-systems. When possible, the data to process was put
into the ZFS filesystem cache, for instance. Also, two external storage devices
were directly attached to the servers under test, each one divided
in two ZFS pools for read and write operations. 
   
  Multi-thread: Testing throughput on the Oracle
T4-1 
   The tests were performed with
different number of simultaneous threads (1, 2, 4, 8, 12, 16, 32, 48
and 64) and using different storage devices: Flash, Fibre Channel
storage, two stripped internal disks and one single internal disk.
All storage devices used ZFS as filesystem and volume management. 
  Each thread read a dedicated
1GB-large file containing 12.5M lines with the following
structure:  
    customerID;FirstName;LastName;StreetAddress;City;State;Zip;Cust_Status;Since_DT;Status_DT 
  1;Ronald;Reagan;South Highway;Santa Fe;Montana;98756;A;04-06-2006;09-08-2008
2;Theodore;Roosevelt;Timberlane Drive;Columbus;Louisiana;75677;A;10-05-2009;27-05-2008
3;Andrew;Madison;S Rustle St;Santa Fe;Arkansas;75677;A;29-04-2005;09-02-2008
4;Dwight;Adams;South Roosevelt Drive;Baton Rouge;Vermont;75677;A;15-02-2004;26-01-2007
[…]
 
   The following graphs present the
results of our tests:   
    
  Unsurprisingly up to 16 threads,
all files fit in the ZFS cache a.k.a L2ARC : once the cache is hot
there is no performance difference depending on the underlying
storage. From 16 threads upwards however, it is clear that IO becomes
a bottleneck, having a good IO subsystem is thus key. Single-disk performance collapses whereas the Sun F5100 and ST6180 arrays allow the
T4-1 to scale quite seamlessly.  From 32 to 64 threads, the
performance is almost constant with just a slow decline. 
   For the database load tests, only
the best IO configuration --using external storage devices-- were
used, hosting the Oracle table spaces and redo log files. 
    
  Using the Sun Storage F5100 array allows the T4-1 server to scale up to 48 parallel JVM
processes before saturating the CPU.  The final result is a
staggering 646K lines per second insertion in an Oracle table using
48 parallel threads.   
  Single-thread: Testing the single thread
performance 
   Seven different tests were
performed on both servers. Given the fact that only one thread, thus
one file was read, no IO bottleneck was involved, all data being
served from the ZFS cache. 
   
     
      Read File ? Filter ? Write File: Read file, filter data, write the filtered data in a new file.
	The filter is set on the “Status” column: only lines with status
	set to “A” are selected. This limits each output file to about
	500 MB.
     
     
      Read File ? Load Database Table: Read file, insert into a single Oracle table.
     
     
      Average: Read file, compute the
	average of a numeric column, write the result in a new file.
     
    Division & Square Root: Read file, perform a division and square root on a numeric column, write
	the result data in a new file.
     
     
      Oracle DB Dump: Dump the content of an Oracle table (12.5M rows) into a CSV file.
     
    Transform: Read file, transform,
	write the result data in a new file. The transformations applied
	are: set the address column to upper case and add an extra column at
	the end, which is the concatenation of two columns.
     
     
      Sort: Read file, sort a numeric
	and alpha numeric column, write the result data in a new file.
     
   
  The following table and graph
present the final results of the tests: 
   
    Throughput unit is thousand
	lines per second processed (K lines/second).
     
    Improvement is the % of
	improvement between the T5140 and T4-1.
 
   
   
     
       
         
          Test 
         
         
          T4-1
				(Time s.) 
         
         
          T5140
				(Time s.) 
         
         
          Improvement 
         
         
          T4-1
				(Throughput) 
         
         
          T5140
				(Throughput) 
         
       
       
         
          Read/Filter/Write 
         
         
          125 
         
         
          806 
         
         
          645% 
         
         
          100 
         
         
          16 
         
       
       
         
          Read/Load
				Database 
         
         
          195 
         
         
          1111 
         
         
          570% 
         
         
          64 
         
         
          11 
         
       
       
         
          Average 
         
         
          96 
         
         
          557 
         
         
          580% 
         
         
          130 
         
         
          22 
         
       
       
         
          Division & Square Root 
         
         
          161 
         
         
          1054 
         
         
          655% 
         
         
          78 
         
         
          12 
         
       
       
         
          Oracle
				DB Dump 
         
         
          164 
         
         
          945 
         
         
          576% 
         
         
          76 
         
         
          13 
         
       
       
         
          Transform
				  
         
         
          159 
         
         
          1124 
         
         
          707% 
         
         
          79 
         
         
          11 
         
       
       
         
          Sort 
         
         
          251 
         
         
          1336 
         
         
          532% 
         
         
          50 
         
         
          9 
         
       
     
   
    
  The improvement of single-thread performance is quite dramatic:
depending on the tests, the T4 is between 5.4 to 7 times faster than
the T2+. It seems clear that the SPARC T4 processor has gone a long
way filling the gap in single-thread performance, without
sacrifying the multi-threaded capability as it still shows a very impressive scaling on heavy-duty multi-threaded jobs.     
  
Finally, as always at Oracle ISV
Engineering, we are happy to help our ISV partners test their own
applications on our platforms, so don't hesitate to contact us and
let's see what the SPARC T4-based systems can do for your application! 
    
   "As describe in this benchmark, Talend Enterprise Data Integration has overperformed on T4. I was generally happy to see that the T4 gave scaling opportunities for many scenarios like complex aggregations. Row by row insertion in Oracle DB is faster with more than 650,000 rows per seconds without using any bulk Oracle capabilities !"  
  Cedric Carbone, Talend CTO.