elffile: ELF Specific File Identification Utility
- by user9154181
Solaris 11 has a new standard user level command, /usr/bin/elffile.
elffile is a variant of the file utility that is focused exclusively on linker
related files: ELF objects, archives, and runtime linker configuration
files. All other files are simply identified as "non-ELF". The primary
advantage of elffile over the existing file utility is in the area of
archives  elffile examines the archive members and
can produce a summary of the contents, or per-member details.
The impetus to add elffile to Solaris came from the effort to
extend the format of Solaris archives so that they could grow beyond
their previous 32-bit file limits. That work introduced a new archive
symbol table format. Now that there was more than one possible format,
I thought it would be useful if the file utility could identify which
format a given archive is using, leading me to extend the file utility:
% cc -c ~/hello.c
% ar r foo.a hello.o 
% file foo.a
foo.a:          current ar archive, 32-bit symbol table
% ar r -S foo.a hello.o 
% file foo.a
foo.a:          current ar archive, 64-bit symbol table
In turn, this caused me to think about all the things that I
would like the file utility to be able to tell me about an archive.
In particular, I'd like to be able to know what's inside without having
to unpack it. The end result of that train of thought was elffile.
Much of the discussion in this article is adapted from the PSARC
case I filed for elffile in December 2010:
PSARC 2010/432 elffile
Why file Is No Good For Archives And Yet Should Not Be Fixed
The standard /usr/bin/file utility is not very useful when applied
to archives. When identifying an archive, a user typically wants to
know 2 things:
    Is this an archive?
    Presupposing that the archive contains objects, which is
       by far the most common use for archives, what platform
       are the objects for? Are they for sparc or x86? 32 or 64-bit?
       Some confusing combination from varying platforms?
The file utility provides a quick answer to question (1), as it identifies
all archives as "current ar archive". It does nothing to answer the
more interesting question (2). To answer that question, requires a
multi-step process:
    Extract all archive members
    Use the file utility on the extracted files, examine the
       output for each file in turn, and compare the results to
       generate a suitable summary description.
    Remove the extracted files
It should be easier and more efficient to answer such an obvious question.
It would be reasonable to extend the file utility to examine archive
contents in place and produce a description. However, there are several
reasons why I decided not to do so:
   The correct design for this feature within the file utility
      would have file examine each archive member in turn, applying
      its full abilities to each member. This would be elegant, but
      also represents a rather dramatic redesign and re-implementation
      of file. Archives nearly always contain nothing but ELF objects
      for a single platform, so such generality in the file utility
      would be of little practical benefit.
    It is best to avoid adding new options to standard utilities
      for which other implementations of interest exist. In the case
      of the file utility, one concern is that we might add an option
      which later appears in the GNU version of file with a different
      and incompatible meaning. Indeed, there have been discussions about
      replacing the Solaris file with the GNU version in the past. This may
      or may not be desirable, and may or may not ever happen. Either way,
      I don't want to preclude it.
    Examining archive members is an O(n) operation, and can
      be relatively slow with large archives. The file utility
      is supposed to be a very fast operation.
I decided that extending file in this way is overkill, and
that an investment in the file utility for better archive support
would not be worth the cost. A solution that is more narrowly focused on
ELF and other linker related files is really all that we need. The necessary
code for doing this already exists within libelf. All that is missing is
a small user-level wrapper to make that functionality available at
the command line.
In that vein, I considered adding an option for this to the elfdump utility.
I examined elfdump carefully, and even wrote a prototype implementation.
The added code is small and simple, but the conceptual fit with the rest of
elfdump is poor. The result complicates elfdump syntax and documentation,
definite signs that this functionality does not belong there.
And so, I added this functionality as a new user level command.
The elffile Command
The syntax for this new command is 
    elffile [-s basic | detail | summary] filename...
Please see the elffile(1) manpage for additional details.
To demonstrate how output from elffile looks, I will use the following
files:
FileDescription
configA runtime linker configuration file produced with crle
dwarf.oAn ELF object
/etc/passwdA text file
mixed.aArchive containing a mixture of ELF and non-ELF members
mixed_elf.aArchive containing ELF objects for different machines
not_elf.aArchive containing no ELF objects
same_elf.aArchive containing a collection of ELF objects for the same machine. This is the most common type of archive.
The file utility identifies these files as follows:
% file config dwarf.o /etc/passwd mixed.a mixed_elf.a not_elf.a same_elf.a
config:         Runtime Linking Configuration 64-bit MSB SPARCV9
dwarf.o:        ELF 64-bit LSB relocatable AMD64 Version 1
/etc/passwd:    ascii text
mixed.a:        current ar archive, 32-bit symbol table
mixed_elf.a:    current ar archive, 32-bit symbol table
not_elf.a:      current ar archive
same_elf.a:     current ar archive, 32-bit symbol table
By default, elffile uses its "summary" output style. This output differs
from the output from the file utility in 2 significant ways:
    Files that are not an ELF object, archive, or runtime linker
       configuration file are identified as "non-ELF", whereas the file
       utility attempts further identification for such files.
    When applied to an archive, the elffile output includes a description
       of the archive's contents, without requiring member extraction or
       other additional steps.
Applying elffile to the above files:
% elffile config dwarf.o /etc/passwd mixed.a mixed_elf.a not_elf.a same_elf.a
config: Runtime Linking Configuration 64-bit MSB SPARCV9
dwarf.o: ELF 64-bit LSB relocatable AMD64 Version 1
/etc/passwd: non-ELF
mixed.a: current ar archive, 32-bit symbol table, mixed ELF and non-ELF content
mixed_elf.a: current ar archive, 32-bit symbol table, mixed ELF content
not_elf.a: current ar archive, non-ELF content
same_elf.a: current ar archive, 32-bit symbol table, ELF 64-bit LSB relocatable AMD64 Version 1
The output for same_elf.a is of particular interest: The vast majority
of archives contain only ELF objects for a single platform, and in this
case, the default output from elffile answers both of the questions about
archives posed at the beginning of this discussion, in a single efficient
step. This makes elffile considerably more useful than file, within the
realm of linker-related files.
elffile can produce output in two other styles, "basic", and "detail".
The basic style produces output that is the same as that from 'file', for
linker-related files. The detail style produces per-member identification
of archive contents. This can be useful when the archive contents are not
homogeneous ELF object, and more information is desired than the summary
output provides:
% elffile -s detail mixed.a     
mixed.a: current ar archive, 32-bit symbol table
mixed.a(dwarf.o): ELF 32-bit LSB relocatable 80386 Version 1
mixed.a(main.c): non-ELF content
mixed.a(main.o): ELF 64-bit LSB relocatable AMD64 Version 1 [SSE]