What's up with LDoms: Part 9 - Direct IO

Posted by Stefan Hinker on Oracle Blogs See other posts from Oracle Blogs or by Stefan Hinker
Published on Wed, 20 Aug 2014 08:39:00 +0000 Indexed on 2014/08/20 10:27 UTC
Read the original article Hit count: 364

Filed under:

In the last article of this series, we discussed the most general of all physical IO options available for LDoms, root domains.  Now, let's have a short look at the next level of granularity: Virtualizing individual PCIe slots.  In the LDoms terminology, this feature is called "Direct IO" or DIO.  It is very similar to root domains, but instead of reassigning ownership of a complete root complex, it only moves a single PCIe slot or endpoint device to a different domain.  Let's look again at hardware available to mars in the original configuration:

root@sun:~# ldm ls-io
NAME                                      TYPE   BUS      DOMAIN   STATUS  
----                                      ----   ---      ------   ------  
pci_0                                     BUS    pci_0    primary          
pci_1                                     BUS    pci_1    primary          
pci_2                                     BUS    pci_2    primary          
pci_3                                     BUS    pci_3    primary          
/SYS/MB/PCIE1                             PCIE   pci_0    primary  EMP     
/SYS/MB/SASHBA0                           PCIE   pci_0    primary  OCC
/SYS/MB/NET0                              PCIE   pci_0    primary  OCC     
/SYS/MB/PCIE5                             PCIE   pci_1    primary  EMP     
/SYS/MB/PCIE6                             PCIE   pci_1    primary  EMP     
/SYS/MB/PCIE7                             PCIE   pci_1    primary  EMP     
/SYS/MB/PCIE2                             PCIE   pci_2    primary  EMP     
/SYS/MB/PCIE3                             PCIE   pci_2    primary  OCC     
/SYS/MB/PCIE4                             PCIE   pci_2    primary  EMP     
/SYS/MB/PCIE8                             PCIE   pci_3    primary  EMP     
/SYS/MB/SASHBA1                           PCIE   pci_3    primary  OCC     
/SYS/MB/NET2                              PCIE   pci_3    primary  OCC     
/SYS/MB/NET0/IOVNET.PF0                   PF     pci_0    primary          
/SYS/MB/NET0/IOVNET.PF1                   PF     pci_0    primary          
/SYS/MB/NET2/IOVNET.PF0                   PF     pci_3    primary          
/SYS/MB/NET2/IOVNET.PF1                   PF     pci_3    primary

All of the "PCIE" type devices are available for SDIO, with a few limitations.  If the device is a slot, the card in that slot must support the DIO feature.  The documentation lists all such cards.  Moving a slot to a different domain works just like moving a PCI root complex.  Again, this is not a dynamic process and includes reboots of the affected domains.  The resulting configuration is nicely shown in a diagram in the Admin Guide:

There are several important things to note and consider here:

  • The domain receiving the slot/endpoint device turns into an IO domain in LDoms terminology, because it now owns some physical IO hardware.
  • Solaris will create nodes for this hardware under /devices.  This includes entries for the virtual PCI root complex (pci_0 in the diagram) and anything between it and the actual endpoint device.  It is very important to understand that all of this PCIe infrastructure is virtual only!  Only the actual endpoint devices are true physical hardware.
  • There is an implicit dependency between the guest owning the endpoint device and the root domain owning the real PCIe infrastructure:
    • Only if the root domain is up and running, will the guest domain have access to the endpoint device.
    • The root domain is still responsible for resetting and configuring the PCIe infrastructure (root complex, PCIe level configurations, error handling etc.) because it owns this part of the physical infrastructure.
    • This also means that if the root domain needs to reset the PCIe root complex for any reason (typically a reboot of the root domain) it will reset and thus disrupt the operation of the endpoint device owned by the guest domain.  The result in the guest is not predictable.  I recommend to configure the resulting behaviour of the guest using domain dependencies as described in the Admin Guide in Chapter "Configuring Domain Dependencies".
  • Please consult the Admin Guide in Section "Creating an I/O Domain by Assigning PCIe Endpoint Devices" for all the details!

As you can see, there are several restrictions for this feature.  It was introduced in LDoms 2.0, mainly to allow the configuration of guest domains that need access to tape devices.  Today, with the higher number of PCIe root complexes and the availability of SR-IOV, the need to use this feature is declining.  I personally do not recommend to use it, mainly because of the drawbacks of the depencies on the root domain and because it can be replaced with SR-IOV (although then with similar limitations).

This was a rather short entry, more for completeness.  I believe that DIO can usually be replaced by SR-IOV, which is much more flexible.  I will cover SR-IOV in the next section of this blog series.

© Oracle Blogs or respective owner

Related posts about /CMT