Wednesday, January 30, 2008

The Absense of Evidence...

...is not evidence of absense! This is definitely true when it comes to bugs. We could spend a year and not prove that X-Plane is bug free, but it only takes a minute to prove that it has bugs.

(This isn't unlike an NP-complete problem like prime number factoring; the asymmetry of proof provides the protection.)

Can you tell I've been reading "The Black Swan"? A book that's very relevant financially these days. Perhaps the tragedy of the housing bubble is that it moved everyone's houses from Mediocristan to Extremistan* without the people that owned the houses (that'd be all of us) noticing. Or perhaps we've been in Extremistan for a long time and just didn't realize.

* (These are Taleb's made up names for systems that experience the law of big numbers vs. systems that experience chaotic outlier events.)

Tuesday, January 15, 2008

Burned by the STL - again!

I have blogged about this before, but it just burned me again.

Vector is usually very fast, but it also makes it really easy to write something really slow.

How about this?
struct forest_goo {
vector stuff;
vector > more_stuff;
};
vector many_forests;

How fast is this?

for(vector::iterator i = many_forests.begin(); i != many_forests.end();)
if(okay_forest(*i)) ++i;
else i = many_forests.erase(i);

Here's a hint: not very. The problem is the mid-vector erase. Vector is a bit stupid - when you erase a vector item, it copies every one of them backward by one spot to close the hole. But copying a vector means allocating memory for the vector (and for a vector of vectors it's worse). So in our case erasing one forest_goo item mid-vector is horribly expensive as long as there is a lot more forest goo after it.

The solution is two-part:
  1. Specialize std::swap() for the forest_goo struct so that you can swap two of them and have them change places without allocating memory. (On most STLs, swap() of vectors is already specialized, so you just have to tell the compiler to swap all the pieces.)
  2. Rewrite the algorithm to swap the to-be-killed forest_goo structs to the end of the vector. The swap is a no-mem-alloc operation. When done, you can then chop the entire end off of the array with clear, which won't require moving forest_goo structs around.

Saturday, January 12, 2008

X-Principles

I was reading about the history of X11 on Wikipedia while trying to fix Linux bugs and I stumbled across the principles of X. There are three I'd like to focus on, two of which I feel apply very well to X-Plane and one which does not.

The only thing worse than generalizing from one example is generalizing from no examples at all.

This is true, and in terms of X-Plane, I think this comes down to "don't add features that aren't needed." When you look at the "cost" of X-Plane in terms of how hard it is to write new code (and in some ways you could say Laminar Research does business by adding new code to X-Plane) our cost goes up with the number of existing features already in place that we have to keep working while we change things.

If a problem is not completely understood, it is probably best to provide no solution at all.

This also is very true for X-Plane. We want to provide long-term compatibility for X-Plane as a platform for airplanes, scenery and plugins. The risk of putting in solutions to problems that we don't understand is that two months later we'll go "ah drat, our APIs aren't quite right" but by that time third parties will be depending on the existing broken behavior.

Provide mechanism rather than policy. In particular, place user interface policy in the clients' hands.

I am critical of X, but you could deflect my criticism by pointing out that it stems from how X is used, which perhaps goes against its original intention. My issues, all stemming from the use of X as the lowest level of window management are:
  1. By making remote-desktop capabilities a core part of the spec, X11 introduces a complexity cost to desktop UI programming that isn't necessary a lot of the time.
  2. X is not a complete solution to a desktop environment layer.
The first criticism is open to debate - who am I to say what needs to be core and what doesn't? I will only say that if I have to sit down wih the Carbon or Win32 APIs and write UI interaction code, the specification of APIs in terms of blocking and non-blocking makes UI development a lot easier than the specification of APIs in terms of two streams of asynchronous RPCs/events. So I would argue the cost is real, but whether it's worth it, well, that depends on how you use X. (I am focusing perhaps too narrowly on those who hope Linux will becomea viable desktop platform.)

The second criticism isn't even remotely fair at all - X isn't a complete solution, and it's meant to have stuff on top of it. I suppose my issue here is: X11 is pretty ubiquitous as the base layer in the Unix world, but what goes on top is open for debate.

And this goes to this third principle. If X11 is only the "bottom half" of a complete UI framework, it makes sense to provide mechanism and omit policy. But....is the bottom half of a complete UI framework useful to clients? It's the lack of policy, encoded in a simple, ubiquitous API that makes simple things easy and hard things possible, that frustrates me about coding on Linux.

So when it comes to X-Plane, I take a very different approach - I treat all of my SDK code as policy work. That is, we try to make it easy to do things we want authors to do and impossible to do things we don't want authors to do. In my experience this is the only reasonably option...given an open platform, users will do, well, everything possible.

Is it fair to be so heavy handed? Call me a techno-fascist but I think it's essential. At the end of the day, third parties count on us to keep the platform stable so that they don't have to waste time updating previous finished products each time we post a new patch to X-Plane. The wider the array of weird behavior that we tolerate, the wider the array of weird behavior we have to treat as backward compatibility cases later. By using technical limitations to require authors to conform to specs the first time, we can avoid compatibility breaks later.

Monday, January 07, 2008

OpenGL and Threads: What's Wrong

So what's wrong with this code? The answer is that OpenGL is asynchronous, so traditional threading models tend to blow up with OpenGL for reasons that are hard to see (because the APIs look synchronous and act close enough to easily forget that they aren't).

Here's the fine print:
  • Calls to OpenGL from the same thread on the same context are executed in-order.
  • Calls from multiple contexts (and multiple threads always means multiple contexts) can execute out of order from the order they were issued.
  • All calls before a flush are executed before all calls after a flush, even across contexts, as long as the renderer is the same. (At least I think this is true.)
In the first example, a worker thread is preparing VBOs (buffers of geometry) and queueing them to a rendering thread that inserts them into the scene graph every now and then. But remember that both the VBO create and use are asynchronous and happen on other threads.

What ends up happening is that the VBO create code is rendered after the VBO draw code, because in-order execution is not guaranteed! This usually causes the driver to complain that the VBO isn't fully built. The solution is to flush after the VBO is created, which forces VBO creation to happen before any future calls.

In the second example, a PBO (buffer of imag data) is filled on the rendering thread, then sent to a worker thread to extract and process. We have the same problem: because we're across threads, the OpenGL extract operation can execute before the fill operation. Once again, flushing after the first thread-op synchronizes us.

Generally speaking, OpenGL isn't a great API for threading. It has very few options for synchronization - glFlush has real (and often quite negative) performance implications and it's a real blunt tool.

There is a 'fence' operation that allows a thread to block until part of the command-stream has
finished but it isn't real useful because:
  • It's vendor specific (all Macs, NV PCs) so it's not everywhere. It's not fun to use two threading models in an app because the threading primitives only exist in one.
  • It's not cross-thread! The fence is not shared between contexts, so we can't block a worker thread based on completion in a rendering thread - we have to block the renderer itself, which sucks.
In X-Plane we work around these things by trying to use deferred renderings, e.g...

while(1)
{
draw_a_lot();
read_old_async_data();
queue_new_async_data();
}

Teh result is that async queries (PBO framebuffer readbacks, occlusion queries) have a full frame before we ask for them, which helps prevent blocking.