Friday, December 22, 2006

Garbage Collection - Memory Management by Negligence

I realize that calling garbage collection "negligent" memory management isn't really fair. But
I've heard enough people argue that garbage collection is the cure for the disease that is C++ memory management bugs, e.g. arguments like these.

The classic C++ response to "new/delete makes bugs" is "manual memory management is fast." I'm not sure I would agree with this. I would say that C++ gives a programmer the flexibility to pick a memory management strategy and tune it to be fast for a given application. But I will argue three other reasons why I would rather have explicit than garbage collected memory management:
  1. Reproducible deallocation paths. When we get a bug in X-Plane where memory has been trashed, an object has been freed, or some other memory-related bug, the most important thing for us is that the bug be reproducible. If the sim employed generalized garbage collection, then a whole catagory of unrelated behavior would potentially introduce when objects are allocated/destroyed. I would even argue that garbage collection breaks encapsulation by allowing the behavior of objects to be influenced by unrelated subsystems in surprising ways (since they are all linked through a garbage collector).
  2. Explicit description of memory-allocation. One thing I like about X-Plane is that I can see where we deallocate memory. Each deallocation is programmed*. If I find a buggy deallocation, I can trace it back to an intended deallocate and then ask "what did I mean by this".
  3. Explicit memory management means programmers thinking about memory management. What I would argue is that you can make all the same kinds of mistakes in a garbage-collected system as you can in an explicit system, e.g. by making circular loops of objects, etc. But no one ever said "when you use new/dispose, just relax and don't think about memory - it'll just work".
*Not necessarily programmed by new/dispose - I am all in favor of building up abstractions around memory management - sometimes even garbage collection.

Okay now I've contradicted myself. I suppose a more fair statement would be that memory management strategies have implications. A programmer should pick a strategy for a given problem and realize that it's a design decision with trade-offs. Picking garbage collection has good and bad things about it, but like most design patterns, it is not appropriate for all code (and I would even say it's not appropriate for all OO code) and while it makes some things easier, it makes other things harder.

Thursday, December 14, 2006

Instrumentation

nVidia has a very cool tool called NVPerfHUD - it's an application that provides on-screen diagnostics and debugging for graphics-intensive applications. Unfortunately for us it has two problems:
  1. It's Windows only and we do 99% of X-Plane development on Macs.
  2. It's nVidia only and we have more ATI hardware in our Macs than nVidia. (Not our fault - that's what Apple ships!)
Fortunately (and typically for an application that's gone through 8 major revisions) X-Plane already has a lot of these things built right into the app. When working on a long-term code base, the investment in built-in diagnostic code is well worth it...perhaps these will give you some ideas on how to add instrumentation to your application.

All of X-Plane's instrumentation is zero-overhead when not used, and relatively low overhead when used, and it ships in the final application. We do this because we can, and also because it allows us to debug in-field apps without having to send out special builds.

Stats Counters
X-Plane uses the plugin dataref system to export a series of private stats counters to a diagnostic plugin for on-screen analysis. The stats counters show everything from the number of cars drawn to the number of segments of the planet view that are rendered.

Stats counters give us a better picture of the internal state of the application. If a user reports slower framerate, the stats counters can help us tell why. Is it because we're drawing too many cars, or because the planet is being drawn.

Art Tuning
We also use datarefs to export a series of tuning values for our artists. They can adjust the overall look of lights, cars, the propeller, etc. via these variables. This lets them work in real time, tuning the sim and seeing changes immediately. Once they reach values they like, we set them as the defaults in the sim.

Perf Flags
OpenGL is a pipeline - if any stage of that pipeline slows down, your framerate sinks. So in order to figure out why X-Plane is slow, we need to know which part of the pipeline is overloaded. To that end we have a series of performance flags (again datarefs) that can be set to intentionally change loading of the pipeline. This is an idea inspired by NVPerfHUD, but implemented directly in our engine.
  • One flag will turn off the flight model, lowering CPU load.
  • One flag will change the clip volume, limiting the amount of vertex processing (and all that follows).
  • Another flag will replace all textures with a 2x2 proxy, relieving pressure on AGP badwidth and in-card VRAM memory bandwidth.
FPS Test
X-Plane ships with a command-line based framerate test. The framerate test controls all sim settings and automatically logs framerate. The framerate test gives us an easy way to regress new code and make sure we haven't hurt performance. It also gives us a definite way to assess the performance of machines in the field.

Hidden Commands
X-Plane exports some hidden commands via the plugin system. (You must have our internal plugin to use them right now.) For example, all pixel shaders can be reloaded from disk without rebooting the sim, which speeds up the development cycle a lot. This kind of functionality is built right into our engine - our shader object understands reloading.

Compiler Flags
A few more invasive debugging techniques require #defines to be flipped inside the sim. This includes a lot of logging options (all that file output kills framerate, so we don't even mess with leaving this on or piping it into the bit bucket) which let us really see what's going on all the way down the scene graph. We can also turn on things like wire frames and stepped drawing (drawing the frame one part at a time and swapping the buffer to see the results).

Adaptive Sampling Profiler
The last tool we have is remote scripting of Shark, Apple's adaptive sampling profiler, via a plugin. I can't say enough good things about Shark, it's just a really great tool. Via the plugin system we can script Shark profiling, giving us very accurate profiling of specific blocks. This stuff is normally off and has to be #defined on, since it's a bit invasive (e.g. when we have Shark attached we don't want to profile every single part of the app, because we'll spend all our time waiting for Shark to process the captured samples).

If there's a moral to the story, I suppose it's that it only takes a few more minutes to change a hacked up, temporary, one-off debugging facility into a permanent, reusable, scalable, clean debugging facility, but you get payback every time you work on the codebase. And the payoff for writing code that's designed for analysis and debugging from day one (especially for OpenGL, where so much of the subsystem is opaque, and bugs usually manifest as a black screen) is even greater.

Tuesday, December 12, 2006

Hemophiliac Code

I managed to slice myself pretty thoroughly while trying to make bagel chips tonight. Besides my surprise both at how deep the cut was and how stupid I am, I had another thought tonight as I type, with my thumb in a bandaid but otherwise working normally: my thumb's self-repair system works really really well.

Compare that to a piece of code. You're running along in a happy function and you hit a null object pointer. But you're really supposed to call that method, unconditionally. What to do? Call it and we bus error. Don't call it and, well, we've defied the logic of the program!

The advantage my thumb has over my code is that it knows pretty much what the right thing to do is under certain (predictable) problem conditions. Blood exposed to open air...probably we've been cut - let's clot. (This is similar to a pilot experiencing an engine failure. It's not good, but it's not unexpected, so it's possible to respond in a way that will maximize the chance for success.)

Given that there is a whole catagory of code defects that we can detect but cannot hope to repair, most programmers take the opposite approach: if we can't hope to survive damage, let's make sure we die every single time! The logic is, better to know that we're getting injured, even if the symtom is the program dying in the lab, than to have unknown damage under the surface that will cause death in the field.

Perhaps a reasonable approach would be, "die early, die often". We never want to have an internal defect and not report it, and we want to report it as early as possible, as that's when we can do the best job of reporting it. Early detection is a good thing in debugging.

Early detection has become even more important in X-Plane as we start to thread our code. To utilize dual-core hardware, we do some of the CPU-intensive work of constructing our 3-d scenery details on the second core. The main thread farms this to a worker thread, who then tosses it back to the main thread to insert into the scene graph between frames.

The problem is: if something goes wrong during scene-graph insertion, we really don't have any idea why. We don't know who called us, because we've just got a chunk of finished geometry (and they all look the same) and the actual code that did the work exited long ago, leaving no call-stack.

Early detection is thus a huge benefit. If we can get our failure on the worker thread as the instantiation happens (rather than later as we edit the scene graph) then we can break into the debugger and play in a wonderland of symbols, local variables, and data.

(Final off topic thought: why is this code bad? Hint: it's not the algorithm that's bad.)

inline float sqr(float x) { return x*x; }
inline float pythag(float x, float y, float z) {
return pthag(sqr(x)+sqr(y)+sqr(z); }
float angle_between(float vec1[3], float vec2[3])
{
float l1=pythag(vec1[0],vec1[1],vec1[2]);
float l2=pythag(vec2[0],vec2[1],vec2[2]);
if(l1 != 0.0) l1 = 1.0 / l1;
if(l2 != 0.0) l2 != 1.0 / l2;
float v1[3] = { vec1[0] * l1,vec1[1] * l1,vec1[2] * l1};
float v2[3] = { vec2[0] * l2, vec2[1] * l2, vec2[2] * l2 };
float dot = v1[0]*v2[0]+v1[1]*v2[1]+v1[2]*v2[2];
return acos(dot) * 180.0 / PI;
}

Monday, December 11, 2006

Intrinsic Linked Lists for Static Construction

This is another excuse to make sure the blogger move to beta hasn't killed all m blogs. In the past I ranted about not being able to move to blogger beta (this blog moved, the others did not). Now I am happily united entirely on the beta blogger...web bliss is mine. (It would be nice if
the old posts listed the correct authors, but that's what we get for flirting with WordPress.)

A while ago I wrote a lot trying to explain how the hell global static construction works in C++.
The best simple summary I can give you is:
  • It doesn't do what you think.
  • Your code will explode in ways you didn't expect.
I also tried to explain the joys of intrinsic linked lists, that is structs that contain a next pointer. (Don't pass these up for the STL - sometimes the old-school technique works better, especially when the issue isn't O(n) but how good your implementation is. Are you sure your app isn't bottlenecked by memory allocation?)

Like peanut and chocolate, these two ideas go well together. That is...static construction problems can be fixed by using intrinsic linked lists. This code is guaranteed to make your life hell:

class foo {
static set all_of_me;
foo() { all_of_me.insert(this); }
~foo() { all_of_me.erase(this); }
// more tuff
};

The idea is that the foo class self-tracks all its members...this is all good until you do this in some other CPP file besides the CPP where all_of_me is defined.

static foo this_static_var_will_kill_me;

First the solution, then why it works. The solution is simply this:
class foo {
static foo * first_of_me;
foo * next;
foo() { this->next = first_of_me; first_of_me = this; }
// more stuff
};

Now foo uses an intrinsic linked list instead of an STL set and we can declare static global foo objects all the time!

Analysis:
  • The problem with static construction is that C++ doesn't guarantee that static global objects will be initialized in any particular order.
  • When a class has a static member variable that is in turn a complex class, that static member variable is effectively a static global object, and thus it will be constructed in no particular order.*
  • In the case of our "foo" example, if the particular foo object is constructed before the set "all_of_me" is constructed, then foo's constructor will try to stick "this" into a set whose contents are probably entirely zero.
  • If that doesn't crash then when the set is constructed (after the foo object) the set will be fully reinitialized, causing our object to be lost.
(For this reason we hope for the crash - unfortunately some STL containers like vector will often fail quietly when they are used before initialization, as long as they're zero'd out first.)

The beauty of intrinsic lists is: C++ guarantees it will do all of the static initialization (that is, zero-setting, etc.) before any functions or methods are called. So we can be sure our list head pointer is NULL, and that's all we need to get started.

One final note - as I define catagories for this post I see Chris has catagories for both "rants" and "IIS". I can't imagine that you'd ever want IIS without the rant tag.

* Okay, so there are some rules on construction order. Trust me, they won't help you do anything productive!