Oracle Data Integrator 11.1.1.5 Complex Files as Sources and Targets
- by Alex Kotopoulis
Overview 
ODI 11.1.1.5 adds the new Complex File technology for use with file sources and targets. The goal is to read or write file structures that are too complex to be parsed using the existing ODI File technology. This includes:
  
    
    Different record types in one list that use different parsing rules
     
    
    Hierarchical lists, for example customers with nested orders
     
    
    Parsing instructions in the file data, such as delimiter types, field lengths, type identifiers
     
    
    Complex headers such as multiple header lines or parseable information in header
     
    
    Skipping of lines
     
    
     Conditional or choice fields
    
  
Similar to the ODI File and XML File technologies, the complex file parsing is done through a JDBC driver that exposes the flat file as relational table structures. Complex files are mapped to one or more table structures, as opposed to the (simple) file technology, which always has a one-to-one relationship between file and table. The resulting set of tables follows the same concept as the ODI XML driver, table rows have additional PK-FK relationships to express hierarchy as well as order values to maintain the file order in the resulting table.
 
  
The parsing instruction format used for complex files is the nXSD (native XSD) format that is already in use with Oracle BPEL. This format extends the XML Schema standard by adding additional parsing instructions to each element. Using nXSD parsing technology, the native file is converted into an internal XML format. It is important to understand that the XML is streamed to improve performance; there is no size limitation of the native file based on memory size, the XML data is never fully materialized.  The internal XML is then converted to relational schema using the same mapping rules as the ODI XML driver. 
How to Create an nXSD file
Complex file models depend on the nXSD schema for the given file. This nXSD file has to be created using a text editor or the Native Format Builder Wizard that is part of Oracle BPEL. BPEL is included in the ODI Suite, but not in standalone ODI Enterprise Edition. The nXSD format extends the standard XSD format through nxsd attributes. NXSD is a valid XML Schema, since the XSD standard allows extra attributes with their own namespaces.
The following is a sample NXSD schema:
  <?xml version="1.0"?>
  
  <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:nxsd="http://xmlns.oracle.com/pcbpel/nxsd" elementFormDefault="qualified" xmlns:tns="http://xmlns.oracle.com/pcbpel/demoSchema/csv" targetNamespace="http://xmlns.oracle.com/pcbpel/demoSchema/csv" attributeFormDefault="unqualified" 
  
  nxsd:encoding="US-ASCII" nxsd:stream="chars" nxsd:version="NXSD">
  
  <xsd:element name="Root">
  
          <xsd:complexType><xsd:sequence>
  
        <xsd:element name="Header">
  
                  <xsd:complexType><xsd:sequence>
  
                          <xsd:element name="Branch" type="xsd:string" nxsd:style="terminated" nxsd:terminatedBy=","/>
  
                          <xsd:element name="ListDate" type="xsd:string" nxsd:style="terminated" nxsd:terminatedBy="${eol}"/>
  
                          </xsd:sequence></xsd:complexType>
  
                          </xsd:element>
  
                  </xsd:sequence></xsd:complexType>
  
          <xsd:element name="Customer" maxOccurs="unbounded">
  
                  <xsd:complexType><xsd:sequence>
  
                  <xsd:element name="Name" type="xsd:string" nxsd:style="terminated" nxsd:terminatedBy=","/>
  
                          <xsd:element name="Street" type="xsd:string" nxsd:style="terminated" nxsd:terminatedBy="," />
  
                          <xsd:element name="City" type="xsd:string" nxsd:style="terminated" nxsd:terminatedBy="${eol}" />
  
                          </xsd:sequence></xsd:complexType>
  
                          </xsd:element>
  
                  </xsd:sequence></xsd:complexType>
  
  </xsd:element>
  
  </xsd:schema>
  
The nXSD schema annotates elements to describe their position and delimiters within the flat text file. The schema above uses almost exclusively the nxsd:terminatedBy instruction to look for the next terminator chars. There are various constructs in nXSD to parse fixed length fields, look ahead in the document for string occurences, perform conditional logic, use variables to remember state, and many more. 
nXSD files can either be written manually using an XML Schema Editor or created using the Native Format Builder Wizard. Both Native Format Builder Wizard as well as the nXSD language are described in the Application Server Adapter Users Guide. The way to start the Native Format Builder in BPEL is to create a new File Adapter; in step 8 of the Adapter Configuration Wizard a new Schema for Native Format can be created:
 
The Native Format Builder guides through a number of steps to generate the nXSD based on a sample native file. If the format is complex, it is often a good idea to “approximate” it with a similar simple format and then add the complex components manually.  The resulting *.xsd file can be copied and used as the format for ODI, other BPEL constructs such as the file adapter definition are not relevant for ODI. Using this technique it is also possible to parse the same file format in SOA Suite and ODI, for example using SOA for small real-time messages, and ODI for large batches. 
This nXSD schema in this example describes a file with a header row containing data and 3 string fields per row delimited by commas, for example:
  Redwood City Downtown Branch, 06/01/2011
    Ebeneezer Scrooge, Sandy Lane, Atherton
    Tiny Tim, Winton Terrace, Menlo Park
  
The ODI Complex File JDBC driver exposes the file structure through a set of relational tables with PK-FK relationships. The tables for this example are:
Table ROOT (1 row):
   
    
      
      ROOTPK
      
      
      Primary Key for root element
      
    
    
      
      SNPSFILENAME
      
      
      Name of the file
      
    
    
      
      SNPSFILEPATH
      
      
      Path of the file
      
    
    
      
      SNPSLOADDATE
      
      
      Date of load
      
    
  
Table HEADER (1 row):
   
    
      
      ROOTFK
      
      
      Foreign Key to ROOT record
      
    
    
      
      ROWORDER
      
      
      Order of row in native document
      
    
    
      
      BRANCH
      
      
      Data
      
    
    
      
      BRANCHORDER
      
      
      Order of Branch within row
      
    
    
      
      LISTDATE
      
      
      Data
      
    
    
      
      LISTDATEORDER
      
      
      Order of ListDate within row
      
    
  
Table ADDRESS (2 rows):
   
    
      
      ROOTFK
      
      
      Foreign Key to ROOT record
      
    
    
      
      ROWORDER
      
      
      Order of row in native document
      
    
    
      
      NAME
      
      
      Data
      
    
    
      
      NAMEORDER
      
      
      Oder of Name within row
      
    
    
      
      STREET
      
      
      Data
      
    
    
      
      STREETORDER
      
      
      Order of Street within row
      
    
    
      
      CITY
      
      
      Data
      
    
    
      
      CITYORDER
      
      
      Order of City within row
      
    
  
Every table has PK and/or FK fields to reflect the document hierarchy through relationships. In this example this is trivial since the HEADER and all CUSTOMER records point back to the PK of ROOT. Deeper nested documents require this to identify parent elements. All tables also have a ROWORDER field to define the order of rows, as well as order fields for each column, in case the order of columns varies in the original document and needs to be maintained. If order is not relevant, these fields can be ignored. 
How to Create an Complex File Data Server in ODI
After creating the nXSD file and a test data file, and storing it on the local file system accessible to ODI, you can go to the ODI Topology Navigator to create a Data Server and Physical Schema under the Complex File technology. 
This technology follows the conventions of other ODI technologies and is very similar to the XML technology. The parsing settings such as the source native file, the nXSD schema file, the root element, as well as the external database can be set in the JDBC URL:
The use of an external database defined by dbprops is optional, but is strongly recommended for production use. Ideally, the staging database should be used for this. Also, when using a complex file exclusively for read purposes, it is recommended to use the ro=true property to ensure the file is not unnecessarily synchronized back from the database when the connection is closed. A data file is always required to be present  at the filename path during design-time. Without this file, operations like testing the connection, reading the model data, or reverse engineering the model will fail.
 All properties of the Complex File JDBC Driver are documented in the Oracle Fusion Middleware Connectivity and Knowledge Modules Guide for Oracle Data Integrator in Appendix C: Oracle Data Integrator Driver for Complex Files Reference.
David Allan has created a great viewlet Complex File Processing - 0 to 60 which shows the creation of a Complex File data server as well as a model based on this server.
How to Create Models based on an Complex File Schema
Once physical schema and logical schema have been created, the Complex File can be used to create a Model as if it were based on a database. When reverse-engineering the Model, data stores(tables) for each XSD element of complex type will be created. Use of complex files as sources is straightforward; when using them as targets it has to be made sure that all dependent tables have matching PK-FK pairs; the same applies to the XML driver as well.
Debugging and Error Handling
There are different ways to test an nXSD file. The Native Format Builder Wizard can be used even if the nXSD wasn’t created in it; it will show issues related to the schema and/or test data. In ODI, the nXSD  will be parsed and run against the existing test XML file when testing a connection in the Dataserver. If either the nXSD has an error or the data is non-compliant to the schema, an error will be displayed.
Sample error message:
  Error while reading native data.
    [Line=1, Col=5] Not enough data available in the input, when trying to read data of length "19" for "element with name D1" from the specified position, using "style" as "fixedLength" and "length" as "". Ensure that there is enough data from the specified position in the input.
  
Complex File FAQ
Is the size of the native file limited by available memory?
  No, since the native data is streamed through the driver, only the available space in the staging database limits the size of the data. There are limits on individual field sizes, though; a single large object field needs to fit in memory.
Should I always use the complex file driver instead of the file driver in ODI now?
  No, use the file technology for all simple file parsing tasks, for example any fixed-length or delimited files that just have one row format and can be mapped into a simple table. Because of its narrow assumptions the ODI file driver is easy to configure within ODI and can stream file data without writing it into a database. The complex file driver should be used whenever the use case cannot be handled through the file driver.
Are we generating XML out of flat files before we write it into a database?
  We don’t materialize any XML as part of parsing a flat file, either in memory or on disk. The data produced by the XML parser is streamed in Java objects that just use XSD-derived nXSD schema as its type system. We use the nXSD schema because is the standard for describing complex flat file metadata in Oracle Fusion Middleware, and enables users to share schemas across products.
Is the nXSD file interchangeable with SOA Suite?
  Yes, ODI can use the same nXSD files as SOA Suite, allowing mixed use cases with the same data format. 
Can I start the Native Format Builder from the ODI Studio?
  No, the Native Format Builder has to be started from a JDeveloper with BPEL instance. You can get BPEL as part of the SOA Suite bundle. Users without SOA Suite can manually develop nXSD files using XSD editors. 
When is the database data written back to the native file?
  Data is synchronized using the SYNCHRONIZE and CREATE FILE commands, and when the JDBC connection is closed. It is recommended to set the ro or read_only property to true when a file is exclusively used for reading so that no unnecessary write-backs occur. 
Is the nXSD metadata part of the ODI Master or Work Repository?
  No, the data server definition in the master repository only contains the JDBC URL with file paths; the nXSD files have to be accessible on the file systems where the JDBC driver is executed during production, either by copying or by using a network file system.
Where can I find sample nXSD files?
  The Application Server Adapter Users Guide contains nXSD samples for various different use cases.