Debugging By ESP… | Accidental Scientist

Debugging By ESP…

I had one of the strangest experiences I’ve had in a long time today. Almost up there with that big switch connected to nothing that when you turn it off crashes the computer, that you hear about in all those old hacker tales…

We’ve had a bug that’s been really tricky to find and hunt down at work. On and off, it has caused lots of the monsters and other animated meshes in the game to vanish in a peculiar way, seemingly completely at random. This has been a problem for a while now, on and off, for at least 4 months if not longer. It’s so intermittent though, that it’s really tough to figure out what’s going on.

Well, yesterday one of the developers finally managed to track down a repro case. After playing around with it for a while, I started getting a hunch that it might have something to do with how we do the lighting on the PS2 version of the game. Messed around with some of our debugging flags, and it seemed to corroborate it; depending on where you placed the camera in the scene, and depending on which lights were visible, bodies and heads would pop into and out of view.

I figured it couldn’t hurt to take a look and see if I could track it down. A complete stab in the dark, but I’m a pretty good debugger, so why not?

So I went back to my desk, fired up Visual Studio, and loaded up the game engine code.

The project opened with the exact file the bug was in on top. The correct function was on screen. And my cursor was sitting on the line of code that was wrong.

Needless to say, this was a bit shocking. I stared at the code for a bit, then a bit more, and then on a hunch decided to check and see if the code was causing a problem.

The basic nature of the bug is this:

class RenderState {

enum BufferIndex {
PointLightBufferID,
PointLightCount,
DirectionalLightBufferID,
DirectionalLightCount,
SpotLightBufferID,
SpotLightCount,
MaxBufferIndex = 8
};

unsigned char m_LastLightingPass[MaxBufferIndex];
unsigned char m_CurrentLightingPass[MaxBufferIndex];

void SetupLightingPass()
{
if (*((unsigned long*)m_LastLightingPass) == *((unsigned long*)m_CurrentLightingPass))
return;
*((unsigned long*)m_LastLightingPass) = *((unsigned long*)m_CurrentLightingPass);
MoreSetupForLightingPass();
}

Ok, so this is a bit on the hokey side, but it’s only for illustrative purposes. The code does a comparison of the two lighting state buffers, and a copy of one over the other if they’re different (in which case it’ll also do more setup work). For speed purposes, it does the comparison in 4-byte chunks, instead of a byte at a time. Which is great, because that’s the width of the bus of the CPU, so it’s really happy to do that comparison that way. It’s nice and efficient.

Unfortunately, there’s a bug here. The problem is that at some point, spotlights were added, but the optimization wasn’t updated to take it into account. This is a pretty easy and understandable mistake to make, but it’s a bugger to actually find.

(In case you’re wondering, probably the easiest way to fix it is like this:

if (((unsigned long*)m_LastLightingPass)[0] == ((unsigned long*)m_CurrentLightingPass)[0] && ((unsigned long*)m_LastLightingPass)[1] == ((unsigned long*)m_CurrentLightingPass)[1])
return;
((unsigned long*)m_LastLightingPass)[0] = ((unsigned long*)m_CurrentLightingPass)[0];

((unsigned long*)m_LastLightingPass)[1] = ((unsigned long*)m_CurrentLightingPass)[1];

But unfortunately that doesn’t get around the problem that if the code changes again, the same problem could still arise. With a smart compiler, one could expect it to make appropriate optimizations for us. Maybe hint to it that the data is aligned to a 4-byte boundary, and the default copy and comparison operators would be set appropriately. But we’re not necessarily using a smart compiler to compile the code, so these kinds of things need to be added by hand.

(For the record, like most software studios, we’re using a branched version of gcc for the PS2 builds).

Visual Studio, though, lets you make those kinds of declarations in code – I’ve not checked if it writes out appropriately optimized assembly code though. I’m willing to bet that it does.

Anyway… this is all getting away from the point. The point is… my IDE opened to the exact line of code that was faulty. All it took was for me to look at it, and, well, follow my nose.

Needless to say, the chances of finding this bug was like finding a needle in a haystack. (And, for the record, it still remains to be seen whether or not it actually fixes the problem). For quite some time after finding it, I actually felt completely buzzed, energised, and more than a little weirded out. I even felt kind of faint (although that may have been a mixture of lack of food and my nicotine patch wearing off). How on earth could a computer do this for me?

Must have been my lucky day. Either way, it’s freakin’ weird. Maybe I’d seen the code, like an idiot savant. Maybe it was subliminal! Or perhaps there’s a ghost that visits coders who are clutching at straws! Maybe I was meant to find that bug…

… and now, the scientist in me takes over, and has to find out what really happened.

Well, let’s look at the evidence – the IDE opens whatever files I had open last time I was in the project. So I must have had that file open. And it opens scrolled to the last place that it was open… so I must have been looking at that function.

I got all the way home from work before I figured it out. Yesterday, I’d been coordinating some changes to the way we render lights on animated meshes, and there are two files in which the lighting coefficients get set up. One of them – you guessed it – was this file. And, yep, guessed again, it’s in that function too. So before I filed the task for one of the team to work on, I did my usual and investigated it and what it would require. After finding out from one of the engine programmers the appropriate places to make the changes we needed, I opened up those files to verify that everything was in order, and that it wasn’t a tremendously risky change.

… and just by chance, the bug was in the same function as the last place I was checking.

So, nope, not ESP. Not even anything weird – just a completely random coincidence. Serendipity, if you will.

But I still think that I may have burned through some of the karma I’ve built up for a while for this one. I’d better not walk under any ladders for a while 🙂

About Simon Cooke

Simon Cooke is a video game developer, ex-freelance journalist, screenwriter, film-maker and all-round good egg in Seattle, WA. The views posted on this blog are his and his alone, and have no relation to anything he's working on, his employer, or anything else and are not an official statement of any kind.
facebook comments