Solaris X86 AESNI OpenSSL Engine
- by danx
Solaris X86 AESNI OpenSSL Engine
Cryptography is a major component of secure e-commerce.
Since cryptography is compute intensive and adds a significant load to applications, such as SSL web servers (https), crypto performance is an important factor.
Providing accelerated crypto hardware greatly helps these applications and will help lead to a wider adoption of cryptography, and lower cost, in e-commerce and other applications.
The Intel Westmere microprocessor has six new instructions to acclerate AES encryption.
They are called "AESNI" for "AES New Instructions".
These are unprivileged instructions, so no "root", other elevated access, or context switch is required to execute these instructions.
These instructions are used in a new built-in OpenSSL 1.0 engine available in Solaris 11, the aesni engine.
Previous Work
Previously,
AESNI instructions were introduced into the Solaris x86 kernel and libraries.
That is, the "aes" kernel module (used by IPsec and other kernel modules) and the Solaris pkcs11 library (for user applications). These are available in Solaris 10 10/09 (update 8) and above, and Solaris 11.
The work here is to add the aesni engine to OpenSSL.
X86 AESNI Instructions
Intel's Xeon 5600 is one of the
processors that support AESNI.
This processor is used in the
Sun Fire X4170 M2
As mentioned above, six new instructions acclerate AES encryption in processor silicon. The new instructions are:
aesenc
	performs one round of AES encryption.  One encryption round is composed of these steps: substitute bytes, shift rows, mix columns, and xor the round key.
	
aesenclast
	performs the final encryption round, which is the same as above, except omitting the mix columns (which is only needed for the next encryption round).
	
aesdec
	performs one round of AES decryption
	
aesdeclast
	performs the final AES decryption round
	
aeskeygenassist
	Helps expand the user-provided key into a "key schedule" of keys, one per round
	
aesimc
	performs an "inverse mixed columns" operation to convert the encryption key schedule into a decryption key schedule
	
pclmulqdq
	Not a AESNI instruction, but performs
	"carryless multiply"
	operations to acclerate AES GCM mode.
Since the AESNI instructions are implemented in hardware, they take a constant number of cycles and are not vulnerable to side-channel timing attacks that attempt to discern some bits of data from the time taken to encrypt or decrypt the data.
Solaris x86 and OpenSSL Software Optimizations
Having X86 AESNI hardware crypto instructions is all well and good, but how do we access it?
The software is available with Solaris 11 and is used automatically
if you are running Solaris x86 on a AESNI-capable processor.
AESNI is used internally in the kernel through kernel crypto modules and
is available in user space through the PKCS#11 library.
For OpenSSL on Solaris 11, AESNI crypto is available directly with a new built-in OpenSSL 1.0 engine, called the "aesni engine."
This is in lieu of the extra overhead of going through the Solaris OpenSSL pkcs11 engine,
which accesses Solaris crypto and digest operations.
Instead, AESNI assembly is included directly in the new aesni engine.
Instead of including the aesni engine in a separate library in /lib/openssl/engines/,
the aesni engine is "built-in", meaning it is included directly in OpenSSL's libcrypto.so.1.0.0 library.
This reduces overhead and the need to manually specify the aesni engine.
Since the engine is built-in (that is, in libcrypto.so.1.0.0), the openssl -engine command line flag or API call is not needed to access the engine—the aesni engine is used automatically on AESNI hardware.
Ciphers and Digests supported by OpenSSL aesni engine
The Openssl aesni engine auto-detects if it's running on AESNI hardware and uses AESNI encryption instructions
for these ciphers: 
AES-128-CBC, AES-192-CBC, AES-256-CBC,
AES-128-CFB128, AES-192-CFB128, AES-256-CFB128,
AES-128-CTR, AES-192-CTR, AES-256-CTR,
AES-128-ECB, AES-192-ECB, AES-256-ECB,
AES-128-OFB, AES-192-OFB, and AES-256-OFB.
Implementation of the OpenSSL aesni engine
The AESNI assembly language routines are not a part of the regular Openssl 1.0.0 release.
AESNI is a part of the "HEAD" ("development" or "unstable") branch of OpenSSL, for future release.
But AESNI is also available as a
separate patch provided by Intel to the OpenSSL project for OpenSSL 1.0.0.
A minimal amount of "glue" code in the aesni engine works between the OpenSSL libcrypto.so.1.0.0 library and the assembly functions.
The aesni engine code is separate from the base OpenSSL code and requires patching only a few source files to use it.  That means OpenSSL can be more easily updated to future versions without losing the performance from the built-in aesni engine.
OpenSSL aesni engine Performance
Here's some graphs of aesni engine performance I measured by running
openssl speed -evp $algorithm where $algorithm is aes-128-cbc, aes-192-cbc, and aes-256-cbc.
These are using the 64-bit version of openssl on the same AESNI hardware, a Sun Fire X4170 M2 with a Intel Xeon E5620 @2.40GHz, running Solaris 11 FCS.
"Before" is openssl without the aesni engine and "after" is openssl with the aesni engine.
The numbers are MBytes/second.
 
OpenSSL aesni engine performance on Sun Fire X4170 M2 (Xeon E5620 @2.40GHz)
(Higher is better; "before"=OpenSSL on AESNI without AESNI engine software, "after"=OpenSSL AESNI engine)
As you can see the speedup is dramatic for all 3 key lengths and for data sizes from 16 bytes to 8 Kbytes—AESNI is about 7.5-8x faster over hand-coded amd64 assembly (without aesni instructions).
Verifying the OpenSSL aesni engine is present
The easiest way to determine if you are running the aesni engine is to type "openssl engine"
on the command line.
No configuration, API, or command line options are needed to use the OpenSSL aesni engine.
If you are running on Intel AESNI hardware with Solaris 11 FCS, you'll see this output indicating you are using the aesni engine:
intel-westmere $ openssl engine
(aesni) Intel AES-NI engine (no-aesni)
(dynamic) Dynamic engine loading support
(pkcs11) PKCS #11 engine support
If you are running on Intel without AESNI hardware you'll see this output indicating the hardware can't support the aesni engine:
intel-nehalem $ openssl engine
(aesni) Intel AES-NI engine (no-aesni)
(dynamic) Dynamic engine loading support
(pkcs11) PKCS #11 engine support
For Solaris on SPARC or older Solaris OpenSSL software, you won't see any aesni engine line at all.
Third-party OpenSSL software (built yourself or from outside Oracle) will not have the aesni engine either.
Solaris 11 FCS comes with OpenSSL version 1.0.0e.  The output of typing 
"openssl version"
should be "OpenSSL 1.0.0e 6 Sep 2011".
64- and 32-bit OpenSSL
OpenSSL comes in both 32- and 64-bit binaries.
64-bit executable is now the default, at /usr/bin/openssl, and OpenSSL 64-bit libraries at
/lib/amd64/libcrypto.so.1.0.0 and libssl.so.1.0.0
The 32-bit executable is at /usr/bin/i86/openssl and
the libraries are at
/lib/libcrytpo.so.1.0.0 and libssl.so.1.0.0.
Availability
The OpenSSL AESNI engine is available in Solaris 11 x86 for both the 64- and 32-bit versions of OpenSSL.
It is not available with Solaris 10.
You must have a processor that supports AESNI instructions, otherwise OpenSSL will fallback to the older, slower AES implementation without AESNI.
Processors that support AESNI include most Westmere and Sandy Bridge class processor architectures.  Some low-end processors (such as for mobile/laptop platforms) do not support AESNI.  The easiest way to determine if the processor supports AESNI is with the isainfo -v command—look for "amd64" and "aes" in the output:
$ isainfo -v
64-bit amd64 applications
        pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse 
        fxsr mmx cmov amd_sysc cx8 tsc fpu
Conclusion
The Solaris 11 OpenSSL aesni engine provides easy access to powerful Intel AESNI hardware cryptography,
in addition to Solaris userland PKCS#11 libraries and Solaris crypto kernel modules.