Much Ado About Nothing: Stub Objects

Posted by user9154181 on Oracle Blogs See other posts from Oracle Blogs or by user9154181
Published on Fri, 11 Nov 2011 10:41:30 -0600 Indexed on 2011/11/11 18:09 UTC
Read the original article Hit count: 605

Filed under:

/Sun

The Solaris 11 link-editor (ld) contains support for a new type of object that we call a stub object. A stub object is a shared object, built entirely from mapfiles, that supplies the same linking interface as the real object, while containing no code or data. Stub objects cannot be executed — the runtime linker will kill any process that attempts to load one. However, you can link to a stub object as a dependency, allowing the stub to act as a proxy for the real version of the object.

You may well wonder if there is a point to producing an object that contains nothing but linking interface. As it turns out, stub objects are very useful for building large bodies of code such as Solaris. In the last year, we've had considerable success in applying them to one of our oldest and thorniest build problems. In this discussion, I will describe how we came to invent these objects, and how we apply them to building Solaris.

This posting explains where the idea for stub objects came from, and details our long and twisty journey from hallway idea to standard link-editor feature. I expect that these details are mainly of interest to those who work on Solaris and its makefiles, those who have done so in the past, and those who work with other similar bodies of code. A subsequent posting will omit the history and background details, and instead discuss how to build and use stub objects. If you are mainly interested in what stub objects are, and don't care about the underlying software war stories, I encourage you to skip ahead.

The Long Road To Stubs

This all started for me with an email discussion in May of 2008, regarding a change request that was filed in 2002, entitled:

4631488 lib/Makefile is too patient: .WAITs should be reduced

This CR encapsulates a number of cronic issues with Solaris builds:

We build Solaris with a parallel make (dmake) that tries to build as much of the code base in parallel as possible. There is a lot of code to build, and we've long made use of parallelized builds to get the job done quicker. This is even more important in today's world of massively multicore hardware.
Solaris contains a large number of executables and shared objects. Executables depend on shared objects, and shared objects can depend on each other. Before you can build an object, you need to ensure that the objects it needs have been built. This implies a need for serialization, which is in direct opposition to the desire to build everying in parallel.
To accurately build objects in the right order requires an accurate set of make rules defining the things that depend on each other. This sounds simple, but the reality is quite complex. In practice, having programmers explicitly specify these dependencies is a losing strategy:
- It's really hard to get right.
- It's really easy to get it wrong and never know it because things build anyway.
- Even if you get it right, it won't stay that way, because dependencies between objects can change over time, and make cannot help you detect such drifing.
- You won't know that you got it wrong until the builds break. That can be a long time after the change that triggered the breakage happened, making it hard to connect the cause and the effect. Usually this happens just before a release, when the pressure is on, its hard to think calmly, and there is no time for deep fixes.
As a poor compromise, the libraries in core Solaris were built using a set of grossly incomplete hand written rules, supplemented with a number of dmake .WAIT directives used to group the libraries into sets of non-interacting groups that can be built in parallel because we think they don't depend on each other.
From time to time, someone will suggest that we could analyze the built objects themselves to determine their dependencies and then generate make rules based on those relationships. This is possible, but but there are complications that limit the usefulness of that approach:
- To analyze an object, you have to build it first. This is a classic chicken and egg scenario.
- You could analyze the results of a previous build, but then you're not necessarily going to get accurate rules for the current code.
- It should be possible to build the code without having a built workspace available.
- The analysis will take time, and remember that we're constantly trying to make builds faster, not slower.
By definition, such an approach will always be approximate, and therefore only incremantally more accurate than the hand written rules described above. The hand written rules are fast and cheap, while this idea is slow and complex, so we stayed with the hand written approach.

Solaris was built that way, essentially forever, because these are genuinely difficult problems that had no easy answer. The makefiles were full of build races in which the right outcomes happened reliably for years until a new machine or a change in build server workload upset the accidental balance of things. After figuring out what had happened, you'd mutter "How did that ever work?", add another incomplete and soon to be inaccurate make dependency rule to the system, and move on. This was not a satisfying solution, as we tend to be perfectionists in the Solaris group, but we didn't have a better answer. It worked well enough, approximately.

And so it went for years. We needed a different approach — a new idea to cut the Gordian Knot.

In that discussion from May 2008, my fellow linker-alien Rod Evans had the initial spark that lead us to a game changing series of realizations:

The link-editor is used to link objects together, but it only uses the ELF metadata in the object, consisting of symbol tables, ELF versioning sections, and similar data. Notably, it does not look at, or understand, the machine code that makes an object useful at runtime.
If you had an object that only contained the ELF metadata for a dependency, but not the code or data, the link-editor would find it equally useful for linking, and would never know the difference. Call it a stub object.
In the core Solaris OS, we require all objects to be built with a link-editor mapfile that describes all of its publically available functions and data. Could we build a stub object using the mapfile for the real object?
It ought to be very fast to build stub objects, as there are no input objects to process.
Unlike the real object, stub objects would not actually require any dependencies, and so, all of the stubs for the entire system could be built in parallel.
When building the real objects, one could link against the stub objects instead of the real dependencies. This means that all the real objects can be built built in parallel too, without any serialization. We could replace a system that requires perfect makefile rules with a system that requires no ordering rules whatsoever. The results would be considerably more robust.

We immediately realized that this idea had potential, but also that there were many details to sort out, lots of work to do, and that perhaps it wouldn't really pan out. As is often the case, it would be necessary to do the work and see how it turned out.

Following that conversation, I set about trying to build a stub object. We determined that a faithful stub has to do the following:

Present the same set of global symbols, with the same ELF versioning, as the real object.
Functions are simple — it suffices to have a symbol of the right type, possibly, but not necessarily, referencing a null function in its text segment.
Copy relocations make data more complicated to stub. The possibility of a copy relocation means that when you create a stub, the data symbols must have the actual size of the real data. Any error in this will go uncaught at link time, and will cause tragic failures at runtime that are very hard to diagnose.
For reasons too obscure to go into here, involving tentative symbols, it is also important that the data reside in bss, or not, matching its placement in the real object.
If the real object has more than one symbol pointing at the same data item, we call these aliased symbols. All data symbols in the stub object must exhibit the same aliasing as the real object.

We imagined the stub library feature working as follows:

A command line option to ld tells it to produce a stub rather than a real object. In this mode, only mapfiles are examined, and any object or shared libraries on the command line are are ignored.
The extra information needed (function or data, size, and bss details) would be added to the mapfile.
When building the real object instead of the stub, the extra information for building stubs would be validated against the resulting object to ensure that they match.

In exploring these ideas, I immediately run headfirst into the reality of the original mapfile syntax, a subject that I would later write about as The Problem(s) With Solaris SVR4 Link-Editor Mapfiles. The idea of extending that poor language was a non-starter. Until a better mapfile syntax became available, which seemed unlikely in 2008, the solution could not involve extentions to the mapfile syntax.

Instead, we cooked up the idea (hack) of augmenting mapfiles with stylized comments that would carry the necessary information. A typical definition might look like:

# DATA(i386)              __iob                   0x3c0
# DATA(amd64,sparcv9)     __iob                   0xa00
# DATA(sparc)             __iob                   0x140
iob;

A further problem then became clear: If we can't extend the mapfile syntax, then there's no good way to extend ld with an option to produce stub objects, and to validate them against the real objects. The idea of having ld read comments in a mapfile and parse them for content is an unacceptable hack. The entire point of comments is that they are strictly for the human reader, and explicitly ignored by the tool.

Taking all of these speed bumps into account, I made a new plan:

A perl script reads the mapfiles, generates some small C glue code to produce empty functions and data definitions, compiles and links the stub object from the generated glue code, and then deletes the generated glue code.
Another perl script used after both objects have been built, to compare the real and stub objects, using data from elfdump, and validate that they present the same linking interface.

By June 2008, I had written the above, and generated a stub object for libc. It was a useful prototype process to go through, and it allowed me to explore the ideas at a deep level. Ultimately though, the result was unsatisfactory as a basis for real product. There were so many issues:

The use of stylized comments were fine for a prototype, but not close to professional enough for shipping product. The idea of having to document and support it was a large concern.
The ideal solution for stub objects really does involve having the link-editor accept the same arguments used to build the real object, augmented with a single extra command line option. Any other solution, such as our prototype script, will require makefiles to be modified in deeper ways to support building stubs, and so, will raise barriers to converting existing code.
A validation script that rederives what the linker knew when it built an object will always be at a disadvantage relative to the actual linker that did the work.
A stub object should be identifyable as such. In the prototype, there was no tag or other metadata that would let you know that they weren't real objects. Being able to identify a stub object in this way means that the file command can tell you what it is, and that the runtime linker can refuse to try and run a program that loads one.

At that point, we needed to apply this prototype to building Solaris. As you might imagine, the task of modifying all the makefiles in the core Solaris code base in order to do this is a massive task, and not something you'd enter into lightly. The quality of the prototype just wasn't good enough to justify that sort of time commitment, so I tabled the project, putting it on my list of long term things to think about, and moved on to other work. It would sit there for a couple of years.

Semi-coincidentally, one of the projects I tacked after that was to create a new mapfile syntax for the Solaris link-editor. We had wanted to do something about the old mapfile syntax for many years. Others before me had done some paper designs, and a great deal of thought had already gone into the features it should, and should not have, but for various reasons things had never moved beyond the idea stage. When I joined Sun in late 2005, I got involved in reviewing those things and thinking about the problem. Now in 2008, fresh from relearning for the Nth time why the old mapfile syntax was a huge impediment to linker progress, it seemed like the right time to tackle the mapfile issue. Paving the way for proper stub object support was not the driving force behind that effort, but I certainly had them in mind as I moved forward.

The new mapfile syntax, which we call version 2, integrated into Nevada build snv_135 in in February 2010:

6916788 ld version 2 mapfile syntax
PSARC/2009/688 Human readable and extensible ld mapfile syntax

In order to prove that the new mapfile syntax was adequate for general purpose use, I had also done an overhaul of the ON consolidation to convert all mapfiles to use the new syntax, and put checks in place that would ensure that no use of the old syntax would creep back in. That work went back into snv_144 in June 2010:

6916796 OSnet mapfiles should use version 2 link-editor syntax

That was a big putback, modifying 517 files, adding 18 new files, and removing 110 old ones.

I would have done this putback anyway, as the work was already done, and the benefits of human readable syntax are obvious. However, among the justifications listed in CR 6916796 was this

We anticipate adding additional features to the new mapfile language that will be applicable to ON, and which will require all sharable object mapfiles to use the new syntax.

I never explained what those additional features were, and no one asked. It was premature to say so, but this was a reference to stub objects. By that point, I had already put together a working prototype link-editor with the necessary support for stub objects. I was pleased to find that building stubs was indeed very fast. On my desktop system (Ultra 24), an amd64 stub for libc can can be built in a fraction of a second:

    % ptime ld -64 -z stub -o stubs/libc.so.1 -G -hlibc.so.1 \
      -ztext -zdefs -Bdirect ...

    real        0.019708910
    user        0.010101680
    sys         0.008528431

In order to go from prototype to integrated link-editor feature, I knew that I would need to prove that stub objects were valuable. And to do that, I knew that I'd have to switch the Solaris ON consolidation to use stub objects and evaluate the outcome. And in order to do that experiment, ON would first need to be converted to version 2 mapfiles. Sub-mission accomplished.

Normally when you design a new feature, you can devise reasonably small tests to show it works, and then deploy it incrementally, letting it prove its value as it goes. The entire point of stub objects however was to demonstrate that they could be successfully applied to an extremely large and complex code base, and specifically to solve the Solaris build issues detailed above. There was no way to finesse the matter — in order to move ahead, I would have to successfully use stub objects to build the entire ON consolidation and demonstrate their value. In software, the need to boil the ocean can often be a warning sign that things are trending in the wrong direction. Conversely, sometimes progress demands that you build something large and new all at once. A big win, or a big loss — sometimes all you can do is try it and see what happens.

And so, I spent some time staring at ON makefiles trying to get a handle on how things work, and how they'd have to change. It's a big and messy world, full of complex interactions, unspecified dependencies, special cases, and knowledge of arcane makefile features...

...and so, I backed away, put it down for a few months and did other work...

...until the fall, when I felt like it was time to stop thinking and pondering (some would say stalling) and get on with it. Without stubs, the following gives a simplified high level view of how Solaris is built:

An initially empty directory known as the proto, and referenced via the ROOT makefile macro is established to receive the files that make up the Solaris distribution.
A top level setup rule creates the proto area, and performs operations needed to initialize the workspace so that the main build operations can be launched, such as copying needed header files into the proto area.
Parallel builds are launched to build the kernel (usr/src/uts), libraries (usr/src/lib), and commands. The install makefile target builds each item and delivers a copy to the proto area. All libraries and executables link against the objects previously installed in the proto, implying the need to synchronize the order in which things are built.
Subsequent passes run lint, and do packaging.

Given this structure, the additions to use stub objects are:

A new second proto area is established, known as the stub proto and referenced via the STUBROOT makefile macro. The stub proto has the same structure as the real proto, but is used to hold stub objects. All files in the real proto are delivered as part of the Solaris product. In contrast, the stub proto is used to build the product, and then thrown away.
A new target is added to library Makefiles called stub. This rule builds the stub objects. The ld command is designed so that you can build a stub object using the same ld command line you'd use to build the real object, with the addition of a single -z stub option. This means that the makefile rules for building the stub objects are very similar to those used to build the real objects, and many existing makefile definitions can be shared between them.
A new target is added to the Makefiles called stubinstall which delivers the stub objects built by the stub rule into the stub proto. These rules reuse much of existing plumbing used by the existing install rule.
The setup rule runs stubinstall over the entire lib subtree as part of its initialization.
All libraries and executables link against the objects in the stub proto rather than the main proto, and can therefore be built in parallel without any synchronization.

There was no small way to try this that would yield meaningful results. I would have to take a leap of faith and edit approximately 1850 makefiles and 300 mapfiles first, trusting that it would all work out. Once the editing was done, I'd type make and see what happened. This took about 6 weeks to do, and there were many dark days when I'd question the entire project, or struggle to understand some of the many twisted and complex situations I'd uncover in the makefiles. I even found a couple of new issues that required changes to the new stub object related code I'd added to ld. With a substantial amount of encouragement and help from some key people in the Solaris group, I eventually got the editing done and stub objects for the entire workspace built. I found that my desktop system could build all the stub objects in the workspace in roughly a minute. This was great news, as it meant that use of the feature is effectively free — no one was likely to notice or care about the cost of building them.

After another week of typing make, fixing whatever failed, and doing it again, I succeeded in getting a complete build! The next step was to remove all of the make rules and .WAIT statements dedicated to controlling the order in which libraries under usr/src/lib are built. This came together pretty quickly, and after a few more speed bumps, I had a workspace that built cleanly and looked like something you might actually be able to integrate someday. This was a significant milestone, but there was still much left to do.

I turned to doing full nightly builds. Every type of build (open, closed, OpenSolaris, export, domestic) had to be tried. Each type failed in a new and unique way, requiring some thinking and rework. As things came together, I became aware of things that could have been done better, simpler, or cleaner, and those things also required some rethinking, the seeking of wisdom from others, and some rework. After another couple of weeks, it was in close to final form. My focus turned towards the end game and integration. This was a huge workspace, and needed to go back soon, before changes in the gate would made merging increasingly difficult.

At this point, I knew that the stub objects had greatly simplified the makefile logic and uncovered a number of race conditions, some of which had been there for years. I assumed that the builds were faster too, so I did some builds intended to quantify the speedup in build time that resulted from this approach. It had never occurred to me that there might not be one. And so, I was very surprised to find that the wall clock build times for a stock ON workspace were essentially identical to the times for my stub library enabled version! This is why it is important to always measure, and not just to assume.

One can tell from first principles, based on all those removed dependency rules in the library makefile, that the stub object version of ON gives dmake considerably more opportunities to overlap library construction. Some hypothesis were proposed, and shot down:

Could we have disabled dmakes parallel feature? No, a quick check showed things being build in parallel.
It was suggested that we might be I/O bound, and so, the threads would be mostly idle. That's a plausible explanation, but system stats didn't really support it. Plus, the timing between the stub and non-stub cases were just too suspiciously identical.
Are our machines already handling as much parallelism as they are capable of, and unable to exploit these additional opportunities? Once again, we didn't see the evidence to back this up.

Eventually, a more plausible and obvious reason emerged: We build the libraries and commands (usr/src/lib, usr/src/cmd) in parallel with the kernel (usr/src/uts). The kernel is the long leg in that race, and so, wall clock measurements of build time are essentially showing how long it takes to build uts. Although it would have been nice to post a huge speedup immediately, we can take solace in knowing that stub objects simplify the makefiles and reduce the possibility of race conditions. The next step in reducing build time should be to find ways to reduce or overlap the uts part of the builds. When that leg of the build becomes shorter, then the increased parallelism in the libs and commands will pay additional dividends. Until then, we'll just have to settle for simpler and more robust.

And so, I integrated the link-editor support for creating stub objects into snv_153 (November 2010) with

6993877 ld should produce stub objects PSARC/2010/397 ELF Stub Objects

followed by the work to convert the ON consolidation in snv_161 (February 2011) with

7009826 OSnet should use stub objects
4631488 lib/Makefile is too patient: .WAITs should be reduced

This was a huge putback, with 2108 modified files, 8 new files, and 2 removed files. Due to the size, I was allowed a window after snv_160 closed in which to do the putback. It went pretty smoothly for something this big, a few more preexisting race conditions would be discovered and addressed over the next few weeks, and things have been quiet since then.

Conclusions and Looking Forward

Solaris has been built with stub objects since February. The fact that developers no longer specify the order in which libraries are built has been a big success, and we've eliminated an entire class of build error. That's not to say that there are no build races left in the ON makefiles, but we've taken a substantial bite out of the problem while generally simplifying and improving things.

The introduction of a stub proto area has also opened some interesting new possibilities for other build improvements. As this article has become quite long, and as those uses do not involve stub objects, I will defer that discussion to a future article.

Developer IT