The Hacks of Life: 2006

Friday, December 22, 2006

Garbage Collection - Memory Management by Negligence

I realize that calling garbage collection "negligent" memory management isn't really fair. But
I've heard enough people argue that garbage collection is the cure for the disease that is C++ memory management bugs, e.g. arguments like these.

The classic C++ response to "new/delete makes bugs" is "manual memory management is fast." I'm not sure I would agree with this. I would say that C++ gives a programmer the flexibility to pick a memory management strategy and tune it to be fast for a given application. But I will argue three other reasons why I would rather have explicit than garbage collected memory management:

Reproducible deallocation paths. When we get a bug in X-Plane where memory has been trashed, an object has been freed, or some other memory-related bug, the most important thing for us is that the bug be reproducible. If the sim employed generalized garbage collection, then a whole catagory of unrelated behavior would potentially introduce when objects are allocated/destroyed. I would even argue that garbage collection breaks encapsulation by allowing the behavior of objects to be influenced by unrelated subsystems in surprising ways (since they are all linked through a garbage collector).
Explicit description of memory-allocation. One thing I like about X-Plane is that I can see where we deallocate memory. Each deallocation is programmed*. If I find a buggy deallocation, I can trace it back to an intended deallocate and then ask "what did I mean by this".
Explicit memory management means programmers thinking about memory management. What I would argue is that you can make all the same kinds of mistakes in a garbage-collected system as you can in an explicit system, e.g. by making circular loops of objects, etc. But no one ever said "when you use new/dispose, just relax and don't think about memory - it'll just work".

*Not necessarily programmed by new/dispose - I am all in favor of building up abstractions around memory management - sometimes even garbage collection.

Okay now I've contradicted myself. I suppose a more fair statement would be that memory management strategies have implications. A programmer should pick a strategy for a given problem and realize that it's a design decision with trade-offs. Picking garbage collection has good and bad things about it, but like most design patterns, it is not appropriate for all code (and I would even say it's not appropriate for all OO code) and while it makes some things easier, it makes other things harder.

Thursday, December 14, 2006

Instrumentation

nVidia has a very cool tool called NVPerfHUD - it's an application that provides on-screen diagnostics and debugging for graphics-intensive applications. Unfortunately for us it has two problems:

It's Windows only and we do 99% of X-Plane development on Macs.
It's nVidia only and we have more ATI hardware in our Macs than nVidia. (Not our fault - that's what Apple ships!)

Fortunately (and typically for an application that's gone through 8 major revisions) X-Plane already has a lot of these things built right into the app. When working on a long-term code base, the investment in built-in diagnostic code is well worth it...perhaps these will give you some ideas on how to add instrumentation to your application.

All of X-Plane's instrumentation is zero-overhead when not used, and relatively low overhead when used, and it ships in the final application. We do this because we can, and also because it allows us to debug in-field apps without having to send out special builds.

Stats Counters
X-Plane uses the plugin dataref system to export a series of private stats counters to a diagnostic plugin for on-screen analysis. The stats counters show everything from the number of cars drawn to the number of segments of the planet view that are rendered.

Stats counters give us a better picture of the internal state of the application. If a user reports slower framerate, the stats counters can help us tell why. Is it because we're drawing too many cars, or because the planet is being drawn.

Art Tuning
We also use datarefs to export a series of tuning values for our artists. They can adjust the overall look of lights, cars, the propeller, etc. via these variables. This lets them work in real time, tuning the sim and seeing changes immediately. Once they reach values they like, we set them as the defaults in the sim.

Perf Flags
OpenGL is a pipeline - if any stage of that pipeline slows down, your framerate sinks. So in order to figure out why X-Plane is slow, we need to know which part of the pipeline is overloaded. To that end we have a series of performance flags (again datarefs) that can be set to intentionally change loading of the pipeline. This is an idea inspired by NVPerfHUD, but implemented directly in our engine.

One flag will turn off the flight model, lowering CPU load.
One flag will change the clip volume, limiting the amount of vertex processing (and all that follows).
Another flag will replace all textures with a 2x2 proxy, relieving pressure on AGP badwidth and in-card VRAM memory bandwidth.

FPS Test
X-Plane ships with a command-line based framerate test. The framerate test controls all sim settings and automatically logs framerate. The framerate test gives us an easy way to regress new code and make sure we haven't hurt performance. It also gives us a definite way to assess the performance of machines in the field.

Hidden Commands
X-Plane exports some hidden commands via the plugin system. (You must have our internal plugin to use them right now.) For example, all pixel shaders can be reloaded from disk without rebooting the sim, which speeds up the development cycle a lot. This kind of functionality is built right into our engine - our shader object understands reloading.

Compiler Flags
A few more invasive debugging techniques require #defines to be flipped inside the sim. This includes a lot of logging options (all that file output kills framerate, so we don't even mess with leaving this on or piping it into the bit bucket) which let us really see what's going on all the way down the scene graph. We can also turn on things like wire frames and stepped drawing (drawing the frame one part at a time and swapping the buffer to see the results).

Adaptive Sampling Profiler
The last tool we have is remote scripting of Shark, Apple's adaptive sampling profiler, via a plugin. I can't say enough good things about Shark, it's just a really great tool. Via the plugin system we can script Shark profiling, giving us very accurate profiling of specific blocks. This stuff is normally off and has to be #defined on, since it's a bit invasive (e.g. when we have Shark attached we don't want to profile every single part of the app, because we'll spend all our time waiting for Shark to process the captured samples).

If there's a moral to the story, I suppose it's that it only takes a few more minutes to change a hacked up, temporary, one-off debugging facility into a permanent, reusable, scalable, clean debugging facility, but you get payback every time you work on the codebase. And the payoff for writing code that's designed for analysis and debugging from day one (especially for OpenGL, where so much of the subsystem is opaque, and bugs usually manifest as a black screen) is even greater.

Tuesday, December 12, 2006

Hemophiliac Code

I managed to slice myself pretty thoroughly while trying to make bagel chips tonight. Besides my surprise both at how deep the cut was and how stupid I am, I had another thought tonight as I type, with my thumb in a bandaid but otherwise working normally: my thumb's self-repair system works really really well.

Compare that to a piece of code. You're running along in a happy function and you hit a null object pointer. But you're really supposed to call that method, unconditionally. What to do? Call it and we bus error. Don't call it and, well, we've defied the logic of the program!

The advantage my thumb has over my code is that it knows pretty much what the right thing to do is under certain (predictable) problem conditions. Blood exposed to open air...probably we've been cut - let's clot. (This is similar to a pilot experiencing an engine failure. It's not good, but it's not unexpected, so it's possible to respond in a way that will maximize the chance for success.)

Given that there is a whole catagory of code defects that we can detect but cannot hope to repair, most programmers take the opposite approach: if we can't hope to survive damage, let's make sure we die every single time! The logic is, better to know that we're getting injured, even if the symtom is the program dying in the lab, than to have unknown damage under the surface that will cause death in the field.

Perhaps a reasonable approach would be, "die early, die often". We never want to have an internal defect and not report it, and we want to report it as early as possible, as that's when we can do the best job of reporting it. Early detection is a good thing in debugging.

Early detection has become even more important in X-Plane as we start to thread our code. To utilize dual-core hardware, we do some of the CPU-intensive work of constructing our 3-d scenery details on the second core. The main thread farms this to a worker thread, who then tosses it back to the main thread to insert into the scene graph between frames.

The problem is: if something goes wrong during scene-graph insertion, we really don't have any idea why. We don't know who called us, because we've just got a chunk of finished geometry (and they all look the same) and the actual code that did the work exited long ago, leaving no call-stack.

Early detection is thus a huge benefit. If we can get our failure on the worker thread as the instantiation happens (rather than later as we edit the scene graph) then we can break into the debugger and play in a wonderland of symbols, local variables, and data.

(Final off topic thought: why is this code bad? Hint: it's not the algorithm that's bad.)

inline float sqr(float x) { return x*x; }
inline float pythag(float x, float y, float z) {
return pthag(sqr(x)+sqr(y)+sqr(z); }
float angle_between(float vec1[3], float vec2[3])
{
float l1=pythag(vec1[0],vec1[1],vec1[2]);
float l2=pythag(vec2[0],vec2[1],vec2[2]);
if(l1 != 0.0) l1 = 1.0 / l1;
if(l2 != 0.0) l2 != 1.0 / l2;
float v1[3] = { vec1[0] * l1,vec1[1] * l1,vec1[2] * l1};
float v2[3] = { vec2[0] * l2, vec2[1] * l2, vec2[2] * l2 };
float dot = v1[0]*v2[0]+v1[1]*v2[1]+v1[2]*v2[2];
return acos(dot) * 180.0 / PI;
}

Monday, December 11, 2006

Intrinsic Linked Lists for Static Construction

This is another excuse to make sure the blogger move to beta hasn't killed all m blogs. In the past I ranted about not being able to move to blogger beta (this blog moved, the others did not). Now I am happily united entirely on the beta blogger...web bliss is mine. (It would be nice if
the old posts listed the correct authors, but that's what we get for flirting with WordPress.)

A while ago I wrote a lot trying to explain how the hell global static construction works in C++.
The best simple summary I can give you is:

It doesn't do what you think.
Your code will explode in ways you didn't expect.

I also tried to explain the joys of intrinsic linked lists, that is structs that contain a next pointer. (Don't pass these up for the STL - sometimes the old-school technique works better, especially when the issue isn't O(n) but how good your implementation is. Are you sure your app isn't bottlenecked by memory allocation?)

Like peanut and chocolate, these two ideas go well together. That is...static construction problems can be fixed by using intrinsic linked lists. This code is guaranteed to make your life hell:

class foo {
static set all_of_me;
foo() { all_of_me.insert(this); }
~foo() { all_of_me.erase(this); }
// more tuff
};

The idea is that the foo class self-tracks all its members...this is all good until you do this in some other CPP file besides the CPP where all_of_me is defined.

static foo this_static_var_will_kill_me;

First the solution, then why it works. The solution is simply this:
class foo {
static foo * first_of_me;
foo * next;
foo() { this->next = first_of_me; first_of_me = this; }
// more stuff
};

Now foo uses an intrinsic linked list instead of an STL set and we can declare static global foo objects all the time!

Analysis:

The problem with static construction is that C++ doesn't guarantee that static global objects will be initialized in any particular order.
When a class has a static member variable that is in turn a complex class, that static member variable is effectively a static global object, and thus it will be constructed in no particular order.*
In the case of our "foo" example, if the particular foo object is constructed before the set "all_of_me" is constructed, then foo's constructor will try to stick "this" into a set whose contents are probably entirely zero.
If that doesn't crash then when the set is constructed (after the foo object) the set will be fully reinitialized, causing our object to be lost.

(For this reason we hope for the crash - unfortunately some STL containers like vector will often fail quietly when they are used before initialization, as long as they're zero'd out first.)

The beauty of intrinsic lists is: C++ guarantees it will do all of the static initialization (that is, zero-setting, etc.) before any functions or methods are called. So we can be sure our list head pointer is NULL, and that's all we need to get started.

One final note - as I define catagories for this post I see Chris has catagories for both "rants" and "IIS". I can't imagine that you'd ever want IIS without the rant tag.

* Okay, so there are some rules on construction order. Trust me, they won't help you do anything productive!

Tuesday, November 28, 2006

Returning an Array in C - the Least of 3 Evils

In C++ you can write things like this:

void get_some_strings(vector& out_strings)
{
out_strings.clear();
for (some loop)
out_strings.push_back(whatever);
}

Easy! How do we do this if our API has to be pure C? Well, there are three options.

First you could always allow the function to allocate memory that's deallocated by the caller. I don't like this option at all. To me, it's a walking memory-leak waiting to happen. You'll have to check for memory leaks every time you use the function and there are no compile-time mechanisms to encourage good programming technique.

A second option is to pass in some preallocated buffers. Hrm - I think I tried that once. Unfortunately as you can read here, the technique is very error prone and it took Sandy (the better half of the SDK) to clean up the mess I made.

The third option, and least evil IMO is to use a callback. Now we have something like this:

void get_some_strings(void (* cb)(const char * str, void * ref), void * ref)
{
for (some stuff)
cb(some string, ref);
}

If you can tollerate the C function callbacks (have a few Martinis - they'll look less strange) this actually provides a very simple implementation, which means less risk of memory-management bugs.

On the client side if you're using this library from C++ you can even turn this back into the vector code you wanted originally like this:

void vector_cb(const char * str, void * ref)
{
vector * v = (vector*)ref;
v->push_back(str);
}

Now I can use this one callback everywhere like this:

vector my_results;
get_some_stuff(vector_cb,&my_results);

What I like about the callback solution is that both the client code and implementation code can be written in a simple and consise manner, which translates into solid code.

Tuesday, November 07, 2006

Surprise - pthread_cond_timedwait takes an absolute time!

Whoever reads the man pages anyway? Turns out pthread_cond_timedwait takes an absolute time! In otherwords, if you want it to sleep for 1 second, you have to pass one second more than the current time, as returned by gettimeofday (yuck) and converted from a timeval to a timespec (double yuck).

As much as I gripe about this because most threading APIs take relative timeouts (as does select), there actually is a use for this.

When writing a thread-safe message queue you might write something like this to read the queue:

lock critical section.
increment waiting thread count.
while(we don't have a message)
 if(pthread_cond_timedwait(condition var, timeout)==ETIMEOUT)
    decrement thread count, unlock, return "timed out"
decrement waiting thread count
read message out of queue
unlock critical section

Now...you might wonder, why do we need a while loop? The answer is: it's possible that between the time that our thread is woken up (vai the condition variable wait) due to a message being queued and the time we actually run, get the lock, and continue execution, another thread could have come along and stolen our message. (Note that another thread can go through this code without ever calling pthread_cond_timedwait if there is already a message, which is good for performance. This is not a FIFO message queue!) Thus we have to while-loop around until we reach a point where we've woken up, acquired the lock, and the message is still there. Once we have that lock, the message isn't going anywhere and we can safely exit the loop.

This is where the absolute time is handy - we might go around the loop 6 times. But the "deadline" - the time after which there's no point in waiting, we should return to the user, is an absolute time, and can be invariant across the loop. Thus there is no need to measure how much time has gone by on each loop and decrement our wait time.

Tuesday, October 10, 2006

OpenGL Fogging Artifacts

Here's a video of a very very low visibility approach in X-Plane. Note how the "fog" (that is, the mixing of the runway and ground to gray) pulses in and out as we fly, and they don't do it at the same time. What's going on here?

What you're seeing is a defect in the fixed-function pipeline. The problem is two-fold:

OpenGL implementations are allowed to calculate fog colors at vertices and do a simple interpolation between the vertices.
The vertices that we interpolate between are not necessarily the corners of your triangle; they could be the vertices that OpenGL adds when it clips your triangle to the view frustum.

So we have two sets of artifacts at once. First consider the case of the ground and runways. Since the fogging "interval" (the distance between fog = 0% and fog = 100%) is quite small here, the same amount of fog is spread along the entirity of a runway triangle (about 50 meters deep) and a ground mesh triangle (at least 90 meters deep, but possibly up to 1 km deep). That means that we go from visible to fog much faster over the runway than over the ground.

As we fly, the actual size of the mesh triangles is changing, as part of each mesh triangle scrolls off screen. This in turn affects the gradient of how fast we fog and what the corner fog colors are.

The results are, well, that video: fog doens't match between the runways and the ground, and the particular strange results vary as we fly.

The solution is, like all things in life, to replace the fixed-function pipeline with a pixel shader. The pixel shader can then use a per-fragment value (like the depth value) to fog. This is more expensive (well, probably not really...we have the depth value around and it's the same number of DSP ops) but will produce consistent fog across the entire area.

Monday, October 09, 2006

VBOs, PBOs and FBOs

(The great is the enemy of the good - in preparing my 10-part series on why iostreams totally suck, I've been putting off blogging anything else.)

Apple has posted two very nice bits of sample code demonstrating PBOs and FBOs. Even though these are Mac-specific, as sample code they're good on any platoform because PBOs and FBOs are OpenGL extensions, not windowing system extensions.

So what are all these objects? Here's the situation:

Vertex Buffer Arrays (VBOs)

The VBO is the greatest thing to come to OpenGL since sliced bread. Basically it's a memory buffer containing geometry data that is managed by the driver. You can either tell OpenGL to copy data into the buffer, or temporarily memory map it and write to it. VBOs abstract as an "alternate memory space" - that is you tell OpenGL to read out of the VBO rather than your process's virtuall address space. (In practice the VBO may be in your process's address space too but this is hidden from your app.)

VBOs rule because:

They allow OpenGL to act asynchronously. When you call client arrays, the driver has to copy all of your data immediately before the function returns. Because OpenGL owns the memory in a VBO it can schedule the VBO to be read later and be sure it won't be tampered with. (After all, the only way to "edit" the VBO is via OpenGL.)
VBOs can be in VRAM or at least placed in memory that's easy for the card to get at. This means potentially much faster drawing.

It's important to note that VBOs as objects are unformatted...that is they are just a big chunk of bytes. OpenGL doesn't know or care whether they contain float data, what the ordering is, etc. until the instant you say "draw". (In fact it is an absurd example, but you could draw the same VBO using the data with multiple interpretations.)

PBOs (Pixel Buffer Objects):

The PBO extension doesn't really make a new kind of OpenGL object. It just says "we can use Buffer Objects for pixel operations, not just geometry operations." Normally a VBO can only be used to provide the data to something like glDrawArrays, that is, as vertex data. PBOs says you can also use a VBO as the memory buffer for glReadPixels (copy from framebuffer to memory) and glDrawPixels (copy from memory to framebuffer), glTexImage2d (copy from memory to a texture) and glGetTexImage (texture to memory).

There are some interesting applications for this (if the driver is clever):

Read-pixels can be very slow because we have to finish drawing before we can read the memory. Since the memory in a PBO is owned by the driver and isn't accessed by the application without OpenGL's knowledge, a read-pixels from the framebuffer to a PBO can be asynchronous; the driver will schedule it for once drawing is done. (But see comments on timing below.) This is an important case when OpenGL's output will go to something other than the screen, like making a movie file.
Texture upload can be slow; a PBO allows this to happen asynchronously, which could allow for some kind of fast threaded texture setup. This could also allow for more efficient processing when textures are changed every frame (e.g. playing a video file in an OpenGL scene).
The case that Apple shows in their demo: PBOs and VBOs are not different OpenGL objects - they're using "buffer objects" (that is, untyped blocks of memory) for pixel or vertex data. So you can draw something to the screen, read it (using glReadPixels) into a PBO and then draw it (using glClientArrays) as a VBO. Since the buffer is never touched directly by the application, this "use an image as vertices" trick happens 100% on the graphics card and can be very fast, with no trips over the graphics bus.

One thing to note, however, is that a PBO is not a texture; you must copy from the PBO to a texture before you can draw with it. These are accelerated copies, but they are still copies.

Buffer Objects and Timing

Buffer objects can be used asynchronously until your application tries to use them. When you map them or try to read their data, OpenGL will have to block your thread until the execution of OpenGL commands has caught up. So in order to enjoy the benefits of asynchronous processing, you need to do "some other stuff" between a call to glReadPixels and a call to glGetBufferSubData or glMapBuffer.

(This is very similar to occlusion queries, where if you ask OpenGL how many pixels were drawn right after you draw them, you'll block until the pixels are really drawn. It's always best to assume OpenGL is lagging behind you. To get around this, X-Plane always asks about the number of pixels from the last frame before counting the number in this frame. This gives OpenGL an entire frame's worth of time to do the counting...by that point we can be sure the drawing has been done.)

FBOs (Framebuffer Objects) and, um, Renderbuffer Objects

Framebuffer objects represent places you can draw to other than the screen. This is the extension we've all been waiting for: with framebuffer objects you can draw directly into a texture.

An FBO is actually distinct from a texture - it represents all of the images you draw into at once. (Remember that when we draw we usually have an RGBA image buffer and a 32-bit depth buffer that is separate, and maybe other buffers too.) So the FBO lets us draw into an RGBA texture and a DEPTH-type texture at the same time, or even mix and match.

The FBO extension also lets us create buffers that are not textures for off-screen rendering.

FBOs represent one of the nicest ways to draw into a texture because they're simple and require no copying from the framebuffer to the texture. The FBO extension also has some functions to handle mipmap generation, so you can draw once and let OpenGL build the mipmap pyramid.

In some ways FBOs make PBOs obsolete: the most modern cards allow vertex shaders to read textures. This means that you can use a texture as geometry data directly. And with FBOs we can draw right to a texture. So rather than the old way (draw to the screen, read to a PBO, use it as a VBO, which involves one copy) we can avoid the copy (draw to a texture, then draw again using that texture for vertex data).

On the other hand I'm not sure that FBOs will replace PBOs that fast; PBOs are an older extension and thus more available, and vertex-shader-based texture reading is only available in 7th generation cards (6th for nVidia). So using FBOs avoids a copy but cuts out a whole set of graphics cards, including the very capable 9700-type cards from ATI (we see a lot of these in-field). It seems that a driver could do a reasonable implementation of PBOs even on older hardware, and with the wide to-computer bandwidth of PCIe even a bus transfer isn't the killer it used to be.

One last musing: I don't know how good the performance of texture-based vertex processing will be vs. streaming a VBO. With X-Plane we do occaisionally see a bottleneck trying to read texture memory...that is, the card can't get pixels from its own high-speed memory to its own high-speed shaders via its own high-speed bus fast enough. This is pretty amazing on its own because the memory controllers on a graphics card are extraordinarily fast. But the combination of high anisotropic filtering and full-screen anti-aliasing (combined with a healthy dose of overdraw by poorly behaved applications :-) can really stretch texture bandwidth. So I don't know whether using texture memory for vertex information will cause a further bottleneck.

(One guess is that for cards that can read from texture memory to vertex shaders, they're very new and have DDR3 memory, so texture memory is so fast that PCIe16x bandwidth isn't that important.)

Tuesday, September 19, 2006

Triple-Boot Mac (and MBR hell)

It only took about 8 hours of total time, but my MacBook Pro can now triple-boot into OS X 10.4.7, Windows XP Home SP2, and Ubuntu Linux (6.06 I think).

The install is actually not that bad once you've been through it once...the process is basically:

Use diskutil to dynamically build two new partitions. The Mac comes with one "EFI" partition for boot control and one HFS+ partition. MBR gives you four max, so you build a Linux one followed by a FAT 32 one.
Install Windows XP. Tedious and annoying but not complicated. Windows takes longer to boot from CD than an entire OS X install from scratch.
Install Linux. This is the slightly dangerous part, as I found out.

So two things took a long time:

First, for some reason my brand new, genuine Windows XP CD-ROM doesn't work very well. Setup had at least two hangs, a failure to init a disk, a BSOD, and an assertion failure. Maybe all of those holograms on the disk play havoc with the optical drive? So the biggest single item was a 5 hour Windows install that mostly involved trying to boot setup over and over.

Windows XP asks questions during the install, so you have to sit there and watch it. Furthermore, if you are using rEFIt like I am, then you'll have to manually select Windows each time the installer reboots itself. The installer normally expects to reincarnate itself, so if your machine mysteriously reboots without warning be sure to keep reselecting to boot from the Windows install CD-ROM.

The other problem was that, almost certainly due to my own sleepiness this morning, I managed to splat LILO onto the boot block of the Windows partitiont. Ooops. Most tutorials online state that the only fix for this is a complete reinstall of both Windows and Linux, but fortunately I discovered this is not true.

It turns out that you can very gently run fixmbr on your Windows partition without disturbing triple-boot goodness. It's pretty much that simple...just launch the windows setup CD, wait 30 minutes for it to load every driver ever written, launch the recovery tool and use fixmbr on the appropriate drive. Linux and Mac continue to boot via rEFIt.

Now the only issue is: Windows can't understand Mac or Linux partitions, Linux can't understand Mac partitions, and while the Mac will figure anything out, it's not automounting.

Monday, September 11, 2006

Remote Debugging with WinDBG

This is not really a tutorial but rather a quick summary of how to do this so I can find this information quickly in the future:

On the "debugging server" (the PC with the problem)

Start WinDBG
Type .server npipe:pipe=pipename

On the "debugging client" (the one you're sitting at)

Start WinDBG
Go to File>Connect to Remote Session
For Connection String enter npipe:server=Server, pipe=PipeName [,password=Password] where Server is the hostname/IP and PipeName is the name that the server chose. The password section is optional.

Friday, August 18, 2006

MacBook Pro

This is just me blogging about getting a new toy...I just upgraded my work Mac laptop to one of the new 15" MacBook Pros. Some random musings...

They're really, really nice machines. Apple does a great job on the "appliance" aspect of the machines. The power cable attaches magnetically. The machine comes with a remote control that integrates with every media app that ships with the system. (All of which can share media from a remote server right out of the box. Digital housem anyone?) As always with Apple power management is good - Apple was preaching power management 10 yearse ago when no one knew what that was about, so by this point it's down in all levels of the software.

The MacBook Pros cost a bloody fortune ... I wouldn't own one if it wasn't for work. But the pro models come with a Radeon X1600, a very nice graphics chip for flight sim. By comparison the regular Mac Books ship with Intel integrated graphics, which are just crap...I'll have to rant on that some other time. Bottom line is - the non-pro MacBooks really aren't usable for flight simming, let alone flight sim dev. But for non-flight-sim users, the regular MacBooks aren't a bad way to get a nice Apple media-friendly machine.

Performance is great. Compile and sim speed rival the G5, a full sized heavy-duty desktop, and should be even better when the second gig of mem comes in the mail.

One quirk that I've seen in web forums too: when the machine is on batteries the wireless network will hang up if left idle for too long. Leaving a ping running keeps the card alive, but email checks aren't frequent enough. I haven't found a setting to keep wireless alive all the time.

EDIT: there is one way the G5 hauls the MacBook Pro - obvious but - virtual memory performance from the 7200 RPM SATA drives in the G5 is light-years ahead of the 5400 drive in the laptop. This is one case where a desktop comes through; the G5 is a "no-wait" machine because the paging performance is so good. But that's a form-factor issue.

Thursday, August 17, 2006

Decoding Error Codes

While i'm on the topic of Windows shortcuts, I'll talk about a command prompt tool that'll decode error codes for you quickly.

net helpmsg <code>

Will respond back with a sentance explaining the error. For example:

net helpmsg 17
The system cannot move the file to a different disk drive.

Very handy! The only catch is that the error code needs to be in decimal...not hex!

Service Control - SC

Windows comes with a built in function known as the Service Control (sc.exe) which gives the user access to some pretty interesting capabilities and information about services. One can Create, Stop, Start, Query or Delete any Windows service.

First the syntax:

SC [\\server] [command] [service_name] [Options]

Now the commands:

query  [qryOpt]   Show status
queryEx [qryOpt]  Show extended info - pid, flags
GetDisplayName    Show the DisplayName
GetKeyName        Show the ServiceKeyName
EnumDepend        Show Dependencies
qc                Show config - dependencies, full path etc
start          START a service.
stop           STOP a service
pause          PAUSE a service.
continue       CONTINUE a service.
create         Create a service. (add it to the registry)
config         permanently change the service configuration
delete         Delete a service (from the registry)
control        Send a control to a service
interrogate    Send an INTERROGATE control request to a service
Qdescription   Query the description of a service
description    Change the description of a service
Qfailure       Query the actions taken by a service upon failure
failure        Change the actions taken by a service upon failure
sdShow         Display a service's security descriptor using SDDL
SdSet          Sets a service's security descriptor using SDDL

The output looks something like this:

SERVICE_NAME       : messenger
TYPE               : 20  WIN32_SHARE_PROCESS
STATE              : 4  RUNNING
                      (STOPPABLE,NOT_PAUSABLE,ACCEPTS_SHUTDOWN)
WIN32_EXIT_CODE    : 0  (0x0)
SERVICE_EXIT_CODE  : 0  (0x0)
CHECKPOINT         : 0x0
WAIT_HINT          : 0x0

It can be useful from time to time to get some more in-depth information about a service that the administrator panel doesn't provide.

Wednesday, August 16, 2006

I/O Error (-36) when writing on Mac

For months X-Plane's web-based installer has been returning I/O errors randomly to users withithe new Intel Macs. The specific offending routine was FSWriteFork but we couldn't figure out why.

We eventually turned to DTS, Apple's third party developer support team - you can buy a tech support "incident" from them - that is, assistance with one bug. It was from them that I got the missing tidbit of information:

Usually ioErr (-36) means something has gone deeply wrong at the lowest levels. Here's what the actual header file says if you have Apple's Carbon SDK:

ioErr = -36, /*I/O error (bummers)*/

That is, generally speaking, application programs can't recover from this - it would usually mean the hard drive is going bad. By comparison paramErr (-50) means you passed something illegal to the function.

Well it turns out that if you pass a block of memory to FSWriteFork and part of that memory is not mapped (that is, trying to access that memory to do the disk write generates a segfault) then you get ioErr back, not paramErr. That tip led us to realize that what we really had was (blushes) bad endian-swapping code that (on little-endian Macs only) generated bogus pointers. Depending on luck and the stars, that bogus memory might be mapped (thus writing junk into the file) or unmapped (generating the -36 error), hence the unreliable behavior.

In hindsight this isn't surprising, once you realize that Mac Carbon file calls are written on top of BSD file manager calls. I don't know the actual implementation of FSWriteFork, but what if it uses BSD's "write"? Well, that can return EFAULT if you pass in unmapped memory.

Tuesday, August 15, 2006

Why I hate the new Google Blogger System

(And I know the Google team is listening to this right, because they can listen to everything?)

It turns out that if you are being invited to the new blogger beta and you have not made your own blog first or switched (which you cannot since slots are limited) you will be in a state where you have access to a blog but have not signed the terms of service! The result is that the edit bar
will appear but any attempt to do anything will put you in a login dialog box that will simply repeat forever, giving no indication of why the login failed to work.

The solution is to start to create a new blog, just to accept the terms of service. Then abort that process and go back to the blog you were invited to and it will work. Obvious!!

Friday, June 16, 2006

Cargo Cult Programming

Sadly, I know a lot of people and companies that have this style of programming:

http://en.wikipedia.org/wiki/Cargo_cult_programming

Threads are a big issue with me. Too many people use threads for the wrong reasons. If you think you need another thread, you better think careful and hard about alternatives because most applications can run just fine single threaded if done properly. Multithreaded programs introduce far more complexity that may outweight its benefits. Of course there are times when multithreading is absolutely necessary but UNDERSTANDING when the right time and wrong time to use threading makes the difference between a good programmer and not so good programmer.

Friday, May 26, 2006

Logging and Divide and Conquer for Performance

There are only two true debugging techniques: log intermediate results and divide and conquer the buggy code. They work well for performance tuning too.

For performance-tuning X-Plane our primary tool is profiling. Some of this is done by manually logging timing points (read off high-fidelity counters). But more useful is Shark, Apple’s adptive sampling profiler.

One of the trickier things about performance-tuning an OpenGL application is that its speed is affected by both CPU and GPU. The nice thing about Shark is that since it samples over time (and not per function call), our framerate doesn’t decrease when we use it. If our framerate decreased, the ratio of CPU to GPU work would change and the profile would be invalid. Also, Shark can sample within a function, which is crucial since we inline very heavily in our tight loops.

Good profiling is critical to performance; we can make almost anything fast but we don’t have time to make everything fast. And what’s slow is rarely what you would think is slow. For example, I just did a profile of 8000 cars to determine whether we can render the full 3-d headlights and taillights at a distance. Surprisingly, the cost of setting up the lights in 3-d is almost nil; assuring that the car is not unnecessarily drawn turns out toe be the performance-critical factor. (Given how many more lights there are than cars, since cars themselves are culled out when far away, the fact that the 3-d math isn’t a hot loop is surprising!)

In this picture you can see a Shark profile of X-Plane where we’re pushing back a lot of items onto a vector that hasn’t been pre-alloated. Thus OS vm functions and reserve() take up 59% of CPU usage and represent all dominating function calls. In a healthy X-Plane, plot_group_layers (our scene-graph iterator) and the various GLEngine calls would dominate.

One problem with profiling is that if you can’t duplicate the exact rendering settings, you can’t safely compare techniques. To determine the cost of a feature, we divide and conquer. For example, to understand what really costs us - the car or the headlight, I can set X-Plane to only draw the cars when the mouse is in the to half of the screen and only draw the headlights when the mouse is on the right side of the screen. This kind of technique lets us see the instantanious performance change for a feature, giving us a differential under the exact same conditions (same number of cars, same number of cars on screen, same distance away…). This is the ultimate confirmation that a feature costs or doesn’t cost us.

Tuesday, May 23, 2006

Fun with global constructors

(Note: for the purpose of this discussion, "global" objects means:

int a;
static int b;
class foo {
static int c;
};
int foo:c;
void func()
{
static int q;
}

For our discussion, a, b and c are "globals" but "q" is not. While all of these will have static storage allocated for them, a b and c will be initialized during program startup; q will be iniitalized the first time func() runs - possibly never! I will have to rant on how the word static has 3 syntactical meanings and at least that many language meanings some other time.)

The rules for the construction of C++ global objects go something like this:

Plain old data (read: int = 0) get initialized before dynamic data (int = some_func(), map). Basically things that can be inited just by splatting their memory are initialized before any code is run.
Within a translation unit, dynamic initialization goes in order of the file.
Between translation units, dynamic initialization can happen in any order. (Essentially the compiler has no idea which file comes "first".)

That’s enough to make us miserable right there: because the order of static initialization is variable between files, it means that if we have an "API" in a translation unit that requires global data to function, we can’t use it before main() is called because our static initialization code might be running before the API’s. We have no way to control this.

But this is C++ - three rules can’t be everything when it comes to static initialization, right?

Dynamic initialization does not have to happen before "main". But it does have to happen before non-initialization code in that translation unit gets called. So going back to our "API" - if we have a global, it will be initialized before the API is used, but it may or may not be initialized before main.
Dynamic initialization can be replaced with static initialization (splatting memory) if:

That initialization doesn’t have any side effects on other initialization and
The compiler can figure out what values thet dynamic initialization would have produced under some cirumstances.

It is worth noting that in this case the compiler can initialize our object to the results of the dynamic initialization or some static value that would be legal too since this is happening before we are required to have an initialized object. Most compilers I have played with tend to fill such objects with zero, but it doesn’t look to me like thet spec requires this.

Okay now we’ve got something confusing enough to really do some damange. Not only will C++ call our globals’ constructors in a basically random order between files, but: it may call them in a random order within files by deciding that what we thought was dynamic was really static (poof - that constructor goes to the front of the line), and this may or may not be happening before main is called.

(For what it’s worth, at least CodeWarrior always initializes everything before main - it’s easier for them to make a big linked list of globals and run through it, translation unit by translatoin unit. And frankly since our global will be built before the translation unit is called, initialization after main is the least of our problems in practice.)

It’s pretty easy to get yourself in trouble with these limitations:

//header
class foo {
public:
foo();
~foo();
static void debug_all_foo();
private:
static set all;
};
// CPP implementation
set foo:all; // this is global
foo::foo()
{
all.insert(this);
}
foo::~foo()
{
all.erase(this);
}
void foo::debug_all_foo()
{
for (set::iterator i = all.begin(); i != all.end(); ++i)
(*i)->do_something();
}
// Usage - in a separate CPP file
static foo my_obj; // also global

The idea here is very simple: foo objs maintain a global set of their own ptrs - so we can do something to all foo() if we need. Set was chosen here because on many STL implementations it can’t be zero initialized without the entire world exploding, which is not true of vector. (Vector will however leak memory you use it while zero-initialized, but I digress.)

The problem is this: what gets constructed first? my_obj or foo.all? The answer is: we cannot know. If my_obj is inited first, foo_all contains, well, I don’t know what, but certainly not the necessary dynamically constructed parts to make a set. Thus my_obj will cause a crash before main or do something else not like what we want. If my_obj is initalized second, we go home happy. Only your C++ compiler knows for sure.

(In the case of vector, the fail case is: the global vector gets zeroed if your compiler is into that thing, then the client code puts an object into the vector, since a zero vector is legitimate in a lot of STL implementations, then the real constructor zeros it out again, leaking memory and "losing" your object mysteriously.)

I just went through this fire drill with X-Plane when making a stats-counter class; the stat object tends to be static to a clien’ts code so that it is "just there and ready" and some internal book-keeping keeps a global map of them around so we can zero all counters by catagory. Since static initialization was important, my solution was: use an intrinsically linked list to chain the objects together for tracking. Because the head of the list sis just a dumb pointer initialized to zero, it’s guaranteed to be correct before any code runs. Each constructor simply updates the head pointer and we end up with a linked list with on coflicts.

Generally I can recommend a few techniques to avoid such constructor chaos, but no one technique will fit all:

If you have a translation unit that forms an "API", don’t use static objects to initialize your internal state if you depend on an external API. If you can’t avoid this (because for example you have global STL variables and some kind of real initialization) consider breaking the initialization up and doing the initialization later.
Dynamically allocate global API-related stuff using operator-new, either in an explicit initialization fuction (called after main) or upon first use of the API.
If you can avoid using globals in an API implementation (and instead requiire some kind of "handle") you can push this problem off to client code.
Use explicit initialization of sub-systems. It’s simple, debuggable, and you never get into static-constructor trouble.

One comment on that last point: if you build up a table of static constructors to build object factories, you’re going to have to explicitly initialize that table anyway. That’ll have to be another blog entry too.

Thursday, May 18, 2006

Installing Panther over Tiger

I’m blogging this because I’ll never remember it otherwise. Here’s what I had to do to put Panther (OS X 10.3) back onto a disk that had been upgraded to Tiger (OS X 10.4). I did this to set up a coding environment to regress OS-specific problems; I don’t recommend this otherwise.

This was done on a system with two internal HDs, so I could be booted into 10.4 on the main HD while screwing around with the second HD. First I wiped out Tiger as best as I could using sudo rm -r in the terminal. I deleted everything that looked Unixy or OS-ish, including the root level dirs /System /var /tmp /etc /private /bin /sbin /usr and anything else that tempted me. Warning: this is a good way to instantly totally destroy an OS installation.

The trickiest part turned out to be that for some reason my old 10.3 install disk that shipped with the G5 doesn’t appear “blessed” under 10.4. Blessing is basically a note on the disk as to where the boot information lives for Macs. The best way to determine what’s going on is with the aptly named “bless” command utility; the man page explains what it does. bless –info will show you if a volume is not bootable; if it’s not then booting with the “c” key (or any other of the 10 ways to boot from CD-ROM will fail).

So now we come to OpenFirmware. Open Firmware is, well, I don’t know exactly what it is, but for our purposes it’s a command shell before anything is booted where we can monkey around. Before booting into Open Firmware, one thing to check: use pdisk (L command) to list the partitions of all drives - we’ll need to know the partition number of our CD-ROM’s main partition. Strangely it appears that all Mac partitions are really two partitions - a small header and then a real partition. So the CD-ROM partition number is “2″, which we’ll need later but not be able to find from OpenFirmware.

To boot into OpenFirmware, hold down the command, option, ‘o’ and ‘f’ keys all at once on boot. You should see some kind of command prompt. The money command is:

boot cd,2:\System\Library\CoreServices\BootX

This basically means boot from partition 2 from the device aliased to “cd” using BootX (that’s a unix file path but with weird slashes). BootX basically always lives on that path on modern OS X installations. BTW if that file doesn’t exist on your CD-ROM, it may not be bootable.

Other useful commands:

devalias - lists all the aliases to devices. Finding devices in the tree is harder if there aren’t aliases, but my G5 seems to have a bunch of nice ones.
dev [device] - change tree devices, similar to ls.
pwd - print current devices
ls - list devices within the current devices.
dir - list files. Syntax is something like dir cd:2,\ to list the root dir of the second partition on the device aliased to CD.

Friday, May 12, 2006

It Had To Be That Way

I just found an extremely rare bug in X-Plane caused by an uninitialized variable in a constructor; this code functioned in such a way that a compiler cannot do the code anlaysis to find the bug, and unfortunately it happens rarely enough that it’s probably in the shipping sim.

To paraphrase the philosophy of C++:

A C++ programmer can do anything, no matter how stupid. After all, 0.001% of the time it might be necessary.
The fastest performance path must be accessable via C++.
Reducing development time by catching dumb mistakes isn’t even remotely a language goal.

Given this, it’s understandable what happened; the language has to allow me to leave junk in my data because it’s faster not to initialize it and sometimes I want to be lazy for speed. Unfortunately it means that catching errors is up to me, and I am human and fallable, especially when I’ve been drinking beer all night.

It got me thinking about whether there could be a language that provides the performance options of C++ but without the “dangerous environment” of C++. Java and C# are managed; I am definitely among the snotty bitflingers who think that for my app garbage collection and managed memory mean unacceptable performance loss. This probably isn’t true 99% of the time, but in the case of X-Plane, we’ve got a number of specialized allocators (hrm — future blog?) that give us better memory performance than we could get by just newing and deleting objects. (This is indeed the 0.001% that C++ cators to.)

As a straw-man, I’m imagining a language where you have to declare your intention to sin. Basically the rules of the language are restricted until you apply some kind of attribute, similar to static. So most classes would work the slow way, e.g. automatic initialization, perhaps managemed memory, who knows, but then when you tag a class as low-level, you assume responsibility for all aspects of the environment.

My guess is that we’d have to apply such a tag to a very small number of classes in X-Plane, and thus we’d get better compiler support for most of our code.

Sunday, May 07, 2006

std::string + FILE = malloc

So you start off like this:

#define CHECK_ERR(x) __CHECK_ERR(x,__FILE__,__LINE__)
void __CHECK_ERR(const char * msg, const char * file, int line)
{
if (g_error != 0)
printf("An error happened: %s (%s: %d.)\n", msg, file, line)
}

That’s a little goofy but we actually use something like this in X-Plane as a rapid debug-only way to spot OpenGL errors. The __FILE__ and __LINE__ macros give us pin-point messages about where the error was caught even on machines where we can’t easily attach a debugger.

Then later on you decide to embrace the STL string and do this:

void __CHECK_ERR(const string& msg, const string& file, int line);

Ouch. The danger (well, one of the dangers) of C++ is that it will change from a very low level to a very high level of abstraction, changing the underlying implementation from something fast to something slow, without ever telling you.

The problem is that despite using const string& for speed, your inputs (__FILE__ and __LINE__) are string literals. This code will thus create a new string object based on the const char * constructor every time this function is called, and clean up the object when done. In other words, __CHECK_ERR, which was a single if statement now allocates and deallocates memory. Ouch! Pepper error checks liberally around the tight loops that emit OpenGL calls and you’ll really see performance suffer.

This is a fundamental problem in picking between char * and std::string. If your API is declared as char *s but your internal implementation and client code is STL strings, the conversion means a slow allocation of memory where you could have had reference counting. But if your client code and implementation uses char *s and the interface uses STL strings, you allocate a string that isn’t needed.

Our solution with X-Plane is to know our client; virtually all routines use STL strings, except the cases where we know we’re going to fed by a string literal, like a __FILE__ macro. In those few cases we revert to const char *s and convert to STL strings at the latest possible moment to avoid the memory allocate.

Saturday, May 06, 2006

Cleanliness is next to…well, something

At a past company I used to debate the merits of various software engineering techniques with my coworkers. (When someone touched a header that was precompiled we had plenty of time to do this - our product could take hours to rebuild under Visual Studio, which I think spoke against at least certain practices, but that’s another post.) We were very focused on shipping product and helping the company’s business, so the question was: does this practice really make money or does it just make engineers happy.

One thing we’d debate was whether writing clean looking code was worth anything…certainly the compiler doesn’t care if your code looks like this:

void light_mgr::prep_lighting_state(light_type in_type, float in_coords[3])
{
if (settings_mgr::use_slow_lights())
setup_textured_lights (in_type );
else
setup_untextured_lights(in_type );

set_light_ref (in_coords);
}

or this

void light_mgr::PrepLightingState(light_type t,
//async_mgr* /*fMgr*/,
// JJ - removed 4/10/02 float coo[3])
{
#if USING_NEW_LIGHTS
if (settings_mgr::use_slow_lights() && USING_NEW_LIGHTS)
setup_textured_lights(t);
#else
setup_textured_lights(2);
#endif
else
/ * Setup_untextured_lights(10);*/
Setup_untextured_lights(t);

SetLightRef(NULL); // COORDS);
}

At the time I was at least partly convinced that us software engineers tried to make things cleaner than was needed for the bottom line of the company because it’s more pleasant to work on the top code than the bottom, which gives you that warm moist feeling of wading through a swamp. I still do think as I did then that programmers are able to work in less-than-optimal code environments.

I’ve been working with Laminar Research for a while now. One thing Austin insists on is scrubbing the code base regularly. He would never leave code like the lower example in the sim. And I think there is a real benefit: ergonomics.

As programmers we sit in front of the computer screen and concentrate on code all day. I can’t speak for other programmers, but for me at least I am often bounded by concentration - that is, how long can I keep focused on many small diverse details so that the code I write is correct the first time? The answer is…a pretty long time but not forever.

To this end I think there is a benefit from keeping the code clean: I’d rather be spending my mental energy on the work at hand than on the mental translation from the lower to upper code snippet every time I look at it. I find that working on X-Plane is less tiring than other code bases I’ve looked at, and I think that that’s partly because we sweep the cruft out regularly.

Wednesday, May 03, 2006

safety_vector

Austin is undertaking a major refactoring of X-Plane right now. First a little C++: we have a lot of code that's hard-coded to work only on the user's aircraft, which is item 0 of an array. Austin wanted to easily detect every case where we were hard-coding 0 as an index and edit the code to be generic for any aircraft in our array of aircraft. This is a typical case of a refactoring where due to the pure quantity of code and lack of a single overarching abstraction we run the risk of missing cases and leaving in bugs. In the case of our flight model, doing half of our work on one plane and half on another would lead to a catagory of bugs that's almost impossible to detect from using the product (e.g. systems mysteriously failing only under certain intermittent conditoins.) I'm not normally a big fan of C++ gymnastics but here's what we came up with:

We invented a named enumeration for our index into our aircraft array. Because we have finite aircraft, making an enum for each aircraft was trivial.
We overloaded operators ++ and — so we could use the enumeration in for-loops. (C++ doesn't naturally provide math operators for named enumerations.)
We moved our aircraft from a real C array to a vector subclass called "safety_vector". Safety vector is a vector with the regular [] operator declared private and a new [] operator declared that takes a named enumeration rather than an integer (okay, std::size_t) as its parameter.

What does this add up to? Code where this is okay:

for (PLANE_INDEX p = plane0; p < plane8; ++p) analyse_engines(p);

But this provides a compile error:

analyse_engines[0];

Perfect! One other comment on this operation: this kind of wide-scale refactoring of the X-Plane code happens all the time. And having watched this happen for a few versions (and worked on other products with conventional life-cycles) I would have to say that continual refactoring is not a source of bugs. (What is a source of bugs I think is code interdependency, which makes regression hard by causing bugs in areas that we wouldn't expect to be affected by feature work.) The benefit of the continual refactoring is that the code doesn't have any cruft. When we go to put new features in we have a clean workspace, which I think speeds implementation.

Tuesday, April 25, 2006

C–

I'm surfing C++ entries on Wikipedia…and it's reinforcing my belief that the best use of C++ is to ignore most of the language. Simply put, I consider C++ to be too feature-rich for serious industrial programming. Off-hand beer-induced rules of thumb:

The choice to use a C++ feature should be based on maintainability and debugability, not just the ability to code it. In particular, take a good look at how badly most debuggers handle very complex templates, then think twice before building some kind of monster templated data model. (I've done this, and man does it burn you later.) It better be worth some pay off, because you will pay for it with debugging.

In that spirit, don't go too crazy with templates. Templates are cool and can do some things very well. By all means use vector and map. But at some point too many templates will cause your compiler to lose its mind and generate tons of slow code. Compiler technology continues to get better and better, but C++ templates can demand a lot.
Avoid operator overloading unless the use of the overloading is so blatantly obvious that even a monkey could figure out what you mean. It's too easy to get clever with operator overloading and make code that's totally impossible to understand. + should mean plus, * should mean multiply or dereference, and if you're not sure which one, don't overload operator* at all!
If you need a pointer to a member function, your design may be screwed up.
Think twice about function overloading. It's not inherently a bad idea (unless the function is virtual, no pun intended) but you can always make the names more descriptive and the code more maintainable.
I believe that private inheritence and pure virtual inheritence are both signs of a problematic OOP design, but that's a complex enough subject that I'll have to write a separate rant about it. I will just say here that private inheritence and virtual base classes are two features you shouldn't need to use a lot.
I have a psychotic hatred for iostreams. That'll have to be a separate rant too, but for now: if you're going to invest time to learn a byzantine cross-platform I/O API, learn stdio, not iostreams. You can use it in pure C code too, and someday when you have to write code with moderately good performance you'll be glad you did.

Scott Meyers said it best in Effective C++: say what you mean and understand what you say. By far the biggest challenge of programming is programming itself, that is, dealing with piles of code, especially when it's not yours or you haven't looked at it in three years. To that end, if a C++ feature makes it hard to understand what you're saying, I say don't use it.

Thursday, April 20, 2006

The Fine Print…

http://developer.apple.com/documentation/developertools/Conceptual/CppRuntimeEnv/Articles/LibCPPDeployment.html Little did I realize that Apple only started providing libstdc++ at the last second. My take on all this is: the GCC 3/4 transition is a little more painful for Mac developers than it would have otherwise been because of the Intel Macs. Normally the workaround would be to sit on GCC 3 for a while for compatibility, but only GCC 4 has x86 support on Mac.

With Syntax Like This…

…who needs enemies? I wrote this code for a GUI_ScrollBar, a derivative of GUI_Control that overrides a function, calls the base, then refreshes itself. Ignoring all of the sins of a design like this…

void GUI_ScrollBar::SetPageSize(float inPageSize)
{
GUI_Controll:SetPageSize(inPageSize);
Refresh();
}

What does this code do? It loops forver and overruns the stack. Here's a hint: two L's and one colon, not one L and two colons. Now here's the cute part…there is no entity GUI_Controll with two L's anywhere. Why does it compile? This is an exercise left to the reader, one that may make you want a drink.

Tuesday, March 28, 2006

fork + execve - so happy together!

A long time ago in the 70s when everyone smoked a lot of dope, someone thought it would be cute to break the application launch APIs into two pieces. First you call fork to clone yourself (but this is really fast because the OS is lazy) and then execve turns one copy into something new. Cute!

I just learned two interesting things about Mac OS X tonight:

1. Apparently fork without execve won’t always work.

2. execve without fork doesn’t work so well either.

In the first case, you can read here about why forking doesn’t always work. In Apple’s defense, I don’t blame them for this - fork is supposed to be fast, and cloning all of the apps resources wouldn’t be. Oh yeah, some of those resources might be owned in some exclusive mode. Historically fork was a way to use processes like threads. Or are threads processes? I don’t know…it’s 2:30 AM and my brain hurts.

In the second case, frankly I have no idea why execve won’t work on its own. I get error 45:

#define ENOTSUP 45 /* Operation not supported */

My speculation is that execve requires a “clean” process of some type, e.g. a forked process and not a regular one.

Either way the moral of the story is clear: apparently no one has tried to use these two functions separately since they made disco illegal, so neither should you. Fork even if you don’t need to, then kill off the parent.

Tuesday, February 14, 2006

So that’s where HINSTANCES come from

I realize that this is probably totally trivial for real Win32 developers, but it took me long enough to find this that I’ll blog it and help Google guide future developers.

RegisterClassEx requires an HINSTANCE…I always thought of this as “that thing that is passed to WinMain. Now I realize - it’s a reference to the DLL that the Window Class function is in

So if you’re like me and want to register a class from library code without having that HINSTANCE from WinMain and you’re not using DLLs, you can just use GetModuleHandle(NULL) and call it a day. Easy!

Saturday, February 11, 2006

var-args and macros? Sure!

I didn’t realize until recently that this works:

#define WED_ThrowPrintf(fmt, …) \
throw wed_error_exception(__FILE__, __LINE__, fmt, __VA_ARGS__)
The reason I like var-args and macros together: they make thorough error reporting relatively easy in C++ without a lot of code.

if (err != noErr) WED_ThrowPrintf(”Unable to open document %s (OS Error %d: %s)”, file_name.c_str(), err, OSTranslate(err));

C++ makes almost anything possible, but does it make it easy? I’m a lazy human programmer; the easier a subsystem is to use, the more I’ll use it. So it makes sense to make good practices easier. The easier it is to encode a lot of context information into an error report, the more likely I am to do it, and the more likely a field-report of an error will contain useful diagnostic information.

Friday, February 10, 2006

What makes a professional?

“My definition of an expert in any field is a person who knows enough about what’s really going on to be scared.”PJ Plauger

It doesn’t get any simpler than that. You can always spot the inexperienced programmers in a meeting saying “Oh yeah that can be done. That’s easy!” That’s when you know to be terrified.

Saturday, February 04, 2006

Don’t Trust Error Messages

I just spent some time chasing an error message that I was getting from a Windows Media component when I tried to open a Windows Media Stream. The error was NS_E_BAD_CONTROL_DATA which is defined as:

The server received invalid data from the client on the control connection

This is a very cryptic message because when dealing with COM objects, the terms “Server” and “Client” can have many many meanings. In this case, I had no real clue whether it was referring to a conventional network Server and Client or a COM server and client. That threw me off for a bit.

I ended up finding the problem and it wasn’t even remotely close to what the error claimed. The problem was that my destination folder where I planned on recording the stream TO didn’t exist. Yeah it’s a stupid mistake but what does that have to do with reading the stream? Remember it was WMASFReader that died not WMASFWriter.

In any case, it’s important to remember that sometimes errors will occur and they will not be caught for quite some time or sometimes not at all. If their effects cascade throughout your program, you may find them causing errors that don’t make sense. If you could peel back the layers of the onion enough to see Microsoft’s underlying code, maybe the error and how it got there would make sense however on the outside, they’ll often seem unrelated and can be VERY misleading.

Do NOT fixate on what the error SAYS. Work backwards and find out what triggered it.

Thursday, February 02, 2006

Your Friend the Priority Queue

One of the things I like about the STL is that it makes easy in C++ higher level data structures that make programs better but would otherwise be too cumbersome to use. For example, when was the last time you coded a red-black tree in C? But std::map is pretty easy.

Here’s my extremely unsound and imprecise definition of a priority queue: a priority queue is a data structure where you can rapidly insert “todo” items, find the most important one, reprioritize an item, and remove a finished item. Ideally I’d like all of these tasks to run in logarithmic time or better.

One way to hack out a priority queue using the STL is like this:

struct my_item;
typedef multimap queue_type;
struct my_item {
queue_type::iterator self;
xint more_stuff;
};

The idea is simple: map already gives us logarithmic time to locate the first item, insert and remove an item. The struct has a pointer to its own iterator allowing it to reprioritize by adding and removing itself, like this:

void reprioritize(my_item * item, float new_prio, queue_type * a_queue)
{
a_queue->erase(my_item->self);
my_item->self = a_queue->insert(queue_type::value_type(new_prio, item));
}

I can’t say that this is terribly good C++ but I can say that it works quite well. An examples of priority queues used to create X-Plane scenery:

We use a greedy algorithm to build our triangulations from elevation data. Basically we find the locatoin on our mesh that is the most wrong and fix it by inserting a new point into our mesh. We then re-evaluate the mesh and fix the next-worst point. We repeat until we’ve added too many points or the worst point isn’t that bad. Here’s where the priority queue comes in: when we insert a point, the error between our local mesh and the ideal elevations change, but only for a few points near the insert. So we want to rapidly reprioritize those points without having to reconsider the others. With a 1.5 million potential points and perhaps a few hundred thousand inserts, the priority queue is critical. With a good priority queue this algorithm takes a few seconds; without it a few hours. (This algorithm comes from a paper - I wish I could find it but I did not record the URL in my code. When I find it I will post a link.)

When should you think priority queue? Well, any time you have to evaluate a whole series of conditions at specific events, and the time of the events is known in advance, a priority queue can help. It may be necessary to replace polling logic (e.g. “is it time to flush the queue now”) with solid predictions (e.g. “I know that this queue is 25 bytes away from being full and that one is 13 bytes from being full”).

The STL provides another approach to priority queues: tucked away in the algorithms section are routines like make_heap, is_heap, push_heap, and pop_heap. A heap is basically an array of data that’s sorted just enough that the smallest item is in front. It’s faster to preserve these relaxed requirements than to do a full sort. I haven’t utilized heaps in my code yet, so I can’t comment on heap vs map performance.

Tuesday, January 31, 2006

Debugging OpenGL

There are only two debugging techniques in the universe:

printf.
/* */

(I would say a nice IDE that gives you stepping, logging, variable examination, etc. is just a really nice variant of these two techniques that avoids recompiles.)

OpenGL poses some particular challenges to debugging: often OpenGL’s failure mode will be to draw nothing, or fill the entire screen with solid black or white. A lot of OpenGL’s internal state is not easily printed by the application, and since OpenGL is state-based, removing code can have cascading effects.

You can still utilize these debugging techniques with OpenGL. Here are some things I’ve done when working on X-Plane to try to retain my sanity while debugging OpenGL code:

Premature swapping: in a double-buffered context, you can swap the context and then wait for a mouse click. If you do this at intermediate points in your drawing, you can see your drawing in steps as it takes place. This is sort of like printf in that it lets you see internal program state that would otherwise be hidden.
Removing drawing state: when something goes wrong you can start to remove the OpenGL tricks you do until you arrive at a simpler problem. Turn off texturing, turn off texture projection, turn off blending, etc. Lighting is a good one to turn off when things get strange, since bad normals can cause abnormal results.
Use a different primitive. If your triangle fan looks funny, try drawing it as a line loop. Or set the polygon mode from fill to line - the wire frame may reveal something not obvious in a solid model.
If you use display lists, set up your code to allow for immediate-mode execution as an option; this will let you catch display-list-related problems (I have seen drivers do things differently in display lists, and there are lots of ways to make coding mistakes with them) and also let you apply other techniques inside the list, like swapping to see work in progress.
#define common modes: for all of the above tricks, if you’re going to be doing them a lot, design the code for debugging. I try to leave a series of #define switches at the top of my translation unit that apply debug-mode effects.

As a side thought, I have found Vertex Buffer Objects (VBOs) to be easier to debug than display lists for a few reasons: it’s pretty easy to revert VBOs to client vertex arrays (CVAs) at which point you can inspect the data in the debugger. Display lists can contain state commands as well as geometry, so it may not be obvious what the full effects of calling a display list is; since VBOs are just geometry you know what you are getting.

An example: in X-Plane we had some code that essentially did this:

static int has_dl = 0;
if (has_dl == 0)
{
// Build the car display list the firs time we need it.
has_dl = glGenLists(1);
glNewList(has_dl, GL_COMPILE);
draw_a_car(); glEndList();
}
glCallList(has_dl);

This would be okay except that everything draw_a_car does gets compiled into the list. Furthermore, any non-OpenGL side-effects of draw_a_car get run only once. Neither of these problems are made clear by the code.

IsWindowVisible can be deceiving!

Recently I had to debug a bit of code that was causing an Active-x control to be hidden. I found that if i commented out the two ShowWindow lines below, the problem went away.

bool bVisibile= mChild.wnd.IsWindowVisible(); mChild.wnd.ShowWindow(SW_HIDE); mChild.wnd.MoveWindow(x, y, w, h); mChild.wnd.ShowWindow(IsWindowVisible ? SW_SHOW : SW_HIDE);

That snippet of code basically saves the current state of the window, then hides it, then moves it, then sets the state back to what it was before. So it should be a completely safe statement right? WRONG! The concept seems to make sense but a careful look at the Microsoft documentation will reveal the problem. Look particularly at this sentance.

If the specified window, its parent window, its parent’s parent window, and so forth, have the WS_VISIBLE style, the return value is nonzero. Otherwise, the return value is zero.

So let’s assume that the child window (the one we’re trying to move) had its WS_VISIBLE bit set so that it can be visible. Let’s also assume that either its parent or its parent’s parent or both were NOT visible. Well then IsWindowVisible would indicate that the child window was in fact NOT visible. Hmm this makes sense because the user clearly cannot see the window if a parent is hidden, BUT the child window’s bit is still set to visible. Well this code snippet doesn’t realize that and sets the child window to hidden. That’s fine for now. We perform our window move but here’s where it dies. We go to set our child window back to its original state…which was what? It was set to visible but that’s not what that function told us. So we leave the child window as hidden. Now when it comes time for the parent that was hidden to be displayed and show the Active-X control, the child window was explicitly set to hidden and cannot be seen.

Bug in the code? Yeap! Bug in the documentation? Possibly. I think the biggest problem is that the naming of this function is a bit misleading. Developers who use this function should ask themselves this question from a users standpoint. “Is this window visible to the user?” That’s how IsWindowVisible will answer the question. Most developers just have the tendancy to become focused on the window they’re trying to manipulate and forget that it’s really a piece of a bigger picture.

Thursday, January 26, 2006

X-Code fun with settings

More X-Code ranting and raving…

One thing that’s nice about X-Code is that it shows you the relative path fragments used to track each project item. In particular if you have a tree of files represented as a tree of items in X-Plane, make sure they’re set to relative-to-enclosed group! If you do this, then you can move whole source-trees around and reconnect them in X-Code with a single command. X-Code’s reconnecting to missing files is pretty good anyway, but setting everything to enclosed-group-relative up front can really save time later.

Last night Austin and I got X-Code into a state where the debugger was clearly out of sync with the source code. Scary! I’ll post more if we can figure out what was going on.

A trick from the Apple guys: X-Code uses environment variables to build up the build environment in a huge complex frightening recursive set of entries…it is very powerful and flexible, but it’s not always obvious what set of effects are combining to create the final project. But they pointed out that if you make a shell-script post-build step (in target-view pick Project->New build Phase–>New Run Script Build Phase) then you can enter a dummy shell script. When the script is executed, X-Code will first use 8 tons of “setenv” commands to dump the internal state of the configuration into the shell environment. The result is visible in the command-line view (normally hidden) in the build progress window. In other words, you can use this to easily dump every single configuration variable’s final value and understand what you’re really getting.

I’ve spent the morning really cleaning out the project - our target now has ZERO customized target settings! In other words, when we make a new target we will not have to change ANYTHING. Or so the theory goes.

More X-Code Ranting - and Tips

I have in the past been a big CodeWarrior fan…here are some thoughts on transitioning to X-Code…(most of the credit must go to the DTS and Apple tools team for these tips…)

- Editor and code browsing…you can get X-Code to do almost everything CodeWarrior can in terms of useful code management. Command-double-click rather than option-double-click to jump to a definition, and the syntax coloring can’t color keywords, but finding symbols and jumping between source files is there.

- There’s a preference to make the layout look like a CodeWarrior project. I am trying to get myself used to the default view because it seems like it conveys more information in the long term. X-Code has a VC.net-like mode too in case you like pain.

- Compile is slow, but it’s also per-CPU and networkable. If you have a dual-core or dual-CPU machine, that’s a nice win and starts to close the gap. If you have a lot of headers, compile is actually faster than CodeWarrior when you crank the optimizers on both sims. (Warning: networked compile does not work between machines with different endian-nesses.)

- Once you understand the concept, preferences management in X-Code is definitely superior to CodeWarrior. In X-Code, every setting is a key-value pair. The target preferences are layered on top of project preferences - any preference can be anywhere. This means that you can factor the vast majority of your preferences out to your project, which simplifies making global settings changes.

- X-Code has a real concept of debug and release builds…in CodeWarrior if you have five applications that need to be debug, inlined debug, release, optimized release, and optimized universal release then that’s 25 targets. In X-Code that’s 5 targets and 5 build modes; most of the target settings are factored into the project anyway. The end result is a lot less setting check-boxes.

I’ll try to post more as I get there. Basically it only took one day to convert X-Plane from CodeWarrior to X-Code, and one more day to make X-Plane run on Intel Mac hardware.

Tuesday, January 24, 2006

Rebate Scams

This is a very important article to read if you’re like me and you love purchasing items that come with rebates. It could save you some hassle and some cash!

http://www.slate.com/id/2084210/

Wednesday, January 18, 2006

X-Code Scorecard

I’m in the process of converting our app from CodeWarrior to X-Code. X-Code is Apple’s IDE based on the GNU toolset.

Good things about X-Code (most of these things are good in CodeWarrior already):

Recursive include directories work.
Precompiled headers work reasonably well.
Manages dependencies, more or less.
Easy to temporarily preprocess and view a file.
Targets and build configurations are separate - this is an improvement over CodeWarrior.
View-by-groups is a nice way to work.

Bad things about X-Code (mostly inheritted from GCC):

Compile speed is significantly slower than CodeWarrior.
Dependency checker sometimes goes haywire and recompiles the PCH - at that point go get a cup of coffee.
GNU STL - enough said.
No per-group compile settings - too bad given the flexibility of the rest of the system. You can’t do this in CodeWarrior but you can do this in VC.net.

If you tried X-Code a few years ago when it first came out, it’s definitely in better shape now. But I’d trade all of the features not found in CodeWarrior to get my compile speed back!

Fun with threads

When it came time to rewrite the X-Plane threading code (which has to run on three APIs) I did what any modern lazy programmer does: I typed “posix threads condition variable” into Google and got this helpful page.

http://www.cs.wustl.edu/~schmidt/win32-cv-1.html

I make no vouchers for how accurate its contents are, but I can tell you that the authors are a lot smarter than I am; I say that because I made the mistake of changing their code and induced a bunch of bugs. But I learned something in the process!

Posix provides condition variables. Basically one or more threads block on the condition variable, and when the condition variable is signaled, one or more threads wakes up and does something useful. When you block on a condition variable, you specify a mutex to release - that release is atomic, meaning no other thread can acquire that mutex you are dropping before you are asleep and blocking on the condition variable. When your thread wakes up from the condition variable, it then acquires the mutex. (This is not atomic - it’s very possible that when you wake up the mutex is locked by someone else and you’ll then go right back to sleep waiting for the mutex. This sounds to me like the recipe for a box-car problem, but that’s off-topic.)

The zen of the condition variable is this: when you have some kind of producer/consumer problem you use a mutex to protect the precious internals of your object and the condition variable to implement blocking operations. A blocking operation acquires the mutex, looks at the internals, decides it needs to sleep, and atomically drops the mutex and blocks on the condition variable. When it fully wakes up (having been signaled by the condition variable and reacquired the mutex) your thread now can re-examine the internals and decide whether it can proceed.

The pseudo-code looks something like this:
void blocking_op () { lock_mutex(my mutex); mark guts that I am sleeping while (guts say I am not ready) wait on condition (my condition, my mutex) guts are ready - do something with them mark guts that I am not sleeping release mutex (my mutex) } void unblocking_op() { lock_mutex(my mutex); if (someone is sleeping) signal do something with guts unlock_mutex(my mutex); }

My utilization of this was for a message queue, but the technique can work with any producer/consumer-type problem. Now being the rocket scientist I am, I looked at this code and said “while loop - who needs it? The condition variable is always signaled when the unblocking op has occurred - why would the guts not be ready?

Turns out there is a race condition, as the paper above describes. Here’s how it works:

First we must understand that the blocking operation only sleeps on the condition variable when it cannot immediately run. In other words, if our message queue has a message and the mutex is free, we just run straight through without any blocking.

Second we must understand that waking up from the condition variable and acquiring the mutex are not atomic. They cannot be - at the instant the condition is signaled the mutex is never free (because it is owned by the unblocking thread). Even if this was not true (I believe that signaling from outside the mutex would introduce a race condition - but this is not the race condition we are looking for) another thread could certainly own the mutex for random reasons. So by Murphy’s Law of threaded code, something is going to happen between when our signalled blocked thread wakes up and when it can actually look at the guts of our object and do something.

The problem is: what states is our object in during this time? In our message queue’s case, the message queue has a message in it. So if a second reader thread comes along and acquires the mutex before our first reader wakes up and acquires it, then the second reader will consume the message without ever sleeping and by the time our first reader actually wakes up, the message is gone!

This is why the while loop is important. The while loop allows the first reader who has had his message stolen to go back to sleep and wait for the next one.

(An implication of all this is that our message queue based on condition variables is not “fair” - if two readers both read the message queue, the earlier one before it is empty the second one after it is filled, the second one may win the message without blocking, while the first one may stay slept. This is probably good for certain applications in some ways, but the thrashing of the slept thread may be unacceptable for others.)

If there’es a lesson from all of this, I think it’s this: threads are fundamentally complex and non-trivial to code. Think carefully before using threads about what the real benefit is and be sure it outweighs the cost of more complex development, less debuggable code and increased maintenance costs.