C!! – Data Tables and value maps in C bang bang (kind of)

One of the things that I passionately hate about software development is defining the same data in multiple places. The reason for this is that it massively increases the chances of getting an annoying, hard to solve bug for no good reason.

The easiest way to explain this is with error codes. For error codes, you need:

  • An enum type of some kind, so that you can pass the error around (unless, of course, you’re using something like an HRESULT, but that has its own thorny class of problems akin to 4CC-registration on ye olde Mac OS).
  • A unique numeric value.
  • A text string that can be used to show the user.
  • A text string for the enum value so that you’re not guessing which error you’re actually dealing with.

    One way we could do this is to give enums (and other types) a string representation. (And we should probably do that anyway – we can generate that data on the fly when we first use it in code). But that only solves for the error-message case, and it doesn’t really help with the human readable text string lookup.

    So let’s make this easy – and define a data table.

    datatable
    {
      columns
      {
         enum ErrorCode : name,  // column 0
         enum ErrorCode : value, // column 1
         stringutf8[] : ErrorText : value
      }
      calculated
      {
         stringutf8[] ErrorCodeText = enum ErrorCode : name
      }
      mapping
      {
         ErrorCode ==> ErrorCodeText;
     
        ErrorCode ==> ErrorText;
      }
      ordering
      {
         ErrorText : for mapping ErrorCode;
         ErrorCodeText : for mapping ErrorCode;
      }
      data
      {
         Ok, 0, "Everything went fine";
         OnFire, 1, "Something is on fire";
         Unknown, -1, "Unknown Error";
         Moldy, 4, "The drive is moldy, and should be washed in a bleach solution";
         Psychological, 3, "The drive is depressed.";
      }
    }

    This is probably the least C++-like thing we’ve done with C!! so far. I should be clear: I’m totally not married to this syntax. What we have here is basically a compile-time transform, that generates the following output:

    enum ErrorCode
    {
       Ok = 0,
       OnFire = 1,
       Unknown = -1,
       Moldy = 4,
       Psychological = 3
    };

    stringutf8[] ErrorText =
    {
       "Unknown Error",
       "Everything went fine",
       "Something is on fire",
       "The drive is depressed",
       "The drive is moldy, and should be washed in a bleach solution"
    };

    stringutf8[] ErrorCodeText =
    {
       "Ok", "OnFire", "Unknown", "Moldy", "Psychological"
    };

    stringutf8 __ErrorCodeToErrorText( ErrorCode value )
    {
       if ( value >= -1 && value <= 1 )
       {
          return ErrorText[value + 1];
       }
       else if ( value >= 3 && value <= 4 )
       {
          return ErrorText[ value ];
       }
       else return null; // or throw exception, or assert...
    }

    stringutf8 __ErrorCodeToErrorCodeText( ErrorCode value )
    {
       if ( value >= -1 && value <= 1 )
       {
          return ErrorCodeText[value + 1];
       }
       else if ( value >= 3 && value <= 4 )
       {
          return ErrorCodeText[ value ];
       }
       else return null; // or throw exception, or assert...
    }


    So what are we actually trying to do here?

    We’re defining our error-code enum, and at the same time, two arrays of UTF8 strings – one for the error code enum IDs (which is automatically generated – you don’t list it), and one for the error code text.

    We’re also generating a mapping between the ErrorCode numeric values and the indices of the strings in the text arrays. And we’re allowing the compiler to re-order those strings to make the mapping more efficient.

    datatable is the only keyword pollution here; once we’re inside the braces, the keywords are context specific.

    Is this valuable?

    To be honest, I can’t tell. It’s something I’ve wanted. (It was actually the programming issue that I came across that sent me down this path in the first place and started me thinking about better ways to handle it). But at the same time, the syntax is kludgy. It feels like an editor-side problem, more than a code-side problem. Something that a macro language like T4 is intended to handle.

    Bait and Switch: DataTable doesn’t go in the C!! language

    So with that in mind, we’re not going to put this in the language. I want to. I just can’t think of a good way to do it that doesn’t go completely out of the confines of a well-defined language, with well-defined rules, and non-arbitrary syntax. I include it here for completeness, and because the ideas might be useful for something else. We’ll probably return to it once we think about editors again.

    This article is part of a series. The previous part is here.

  • About the author

    Simon Cooke is an occasional video game developer, ex-freelance journalist, screenwriter, film-maker, musician, and software engineer in Seattle, WA.

    The views posted on this blog are his and his alone, and have no relation to anything he's working on, his employer, or anything else and are not an official statement of any kind by them (and barely even one by him most of the time).

    facebook comments