C!! – Redefining the native types of C++ for C bang bang | Accidental Scientist

C!! – Redefining the native types of C++ for C bang bang

This article is going to be a lot simpler than the other articles on C!! (which hopefully are still percolating a bit). Mainly because I want to handle something simple before I dive into deeper waters – like generators.

So if you were going to redefine C++’s type system, how would you do it?

My goals are to make the types explicit sizes where possible, and obvious where not. And some of these are (as with everything else in C!! at this point) up for debate.

By the way, simple doesn’t mean short, sadly. This is a long one.

Integer Types

Let’s start with the explicitly sized types. This should be easy.

Explicit-size Integer Types

Name Description
uint8
byte
char
Unsigned 8-bit integer
uint16
wchar
ushort
Unsigned 16-bit integer
uint32
uint
Unsigned 32-bit integer
uint64
ulong
Unsigned 64-bit integer
uint128 Unsigned 128-bit integer
uint256 Unsigned 256-bit integer
uint512 Unsigned 512-bit integer
uintn Reserved. n must be a power of 2.
int8 Signed 8-bit integer
int16
short
Signed 16-bit integer
int32
int
Signed 32-bit integer
int64
long
Signed 64-bit integer
int128 Signed 128-bit integer
int256 Signed 256-bit integer
int512 Signed 512-bit integer
intn Signed n-bit integer. Reserved. Must be a power of 2.

The idea behind uintn and intn are that they are for future expansion, and just block off that part of the namespace. I’m still noodling on the 128/256/512-bit integers; they might be doable with SIMD instructions, but they’re not exactly general purpose. We could leave them out and stop at (u)int64. Either way, we’ll reserve the syntax.

Platform-specific Integer Types

Name Description
uintr Unsigned, aligned atomic integer
intr Signed, aligned atomic integer

By default, uintr and intr are register-sized for a given target platform. So on x64, they’re guaranteed to be 64-bit. On x86, they’d be 32-bit. These types take the old-school meaning of int and unsigned int – that is, the native CPU word size.

By default, they also have the minimum atomic platform alignment for the type. (These are meant for speed).

Backwards-compatible Integer types

Because we’re keeping compatibility with C++ and C where it makes sense, I’m not going to redefine the wheel entirely. But I’m also not going to go too crazy. As a concession to C++ programmers, let’s keep unsigned, signed, char, short, int, long. Let’s also keep wchar, but drop the _t.

I’m philosophically opposed to long long. (Remember: we’re not going for code-level backwards compatibility here, so I can drop it).

I stuck them in the table above for brevity.

Real types

The reals (as opposed to the integers) come in three flavors – floating point, fixed point, and decimal. (Yeah, I know… decimal isn’t a game-y thing, but it’s needed for completeness).

Floating-point types

Name Description
float16
half
16-bit floating point value (5 exponent, 10 fraction bits)
float32
float
single
scalar
32-bit floating point value (8 exponent, 23 fraction bits)
float64
double
64-bit floating point value (11 exponent, 52 fraction bits)
float80
long double
80-bit floating point value (15 exponent, 63 fraction bits)
float128
quadruple
128-bit floating point value (15 exponent, 112 fraction bits)
floatn Reserved for the future

The proliferation of names is quite horrible, particularly for float32. We might want to cut them down to size a bit (but it’s so tempting to include scalar for symmetry with vector).

Fixed-point types

Fixed point comes in with a slightly different naming scheme:

Name Description
fixedn_m Signed fixed point with n 2’s complement integer bits, and m fraction bits.
ufixedn_m Unsigned fixed point with n integer bits, m fraction bits.

The space taken by the fixed-point value will be rounded up to the nearest machine word size, unless it’s packed into a bitfield, in which case it’ll use n + m bits.

Decimal is simply decimal32, decimal64 and decimal128 from IEEE-754 2008. We reserve decimaln for the future, like the other definitions.

SIMD Types

You could make a case here for these all being striped structs, but there are some advantages to them being native types. We need two; vector and matrix.

I’m tempted to go for a variant on HLSL syntax here:

Name Description
vector2 2 float-32’s (x,y)
vector3 3 float-32’s (x,y,z). When promoted to vector, w is set to 0.
vector
vector4
4 float-32’s (x,y,z,w)
vector2i vector of 2 signed int32’s
vector2u vector of 2 unsigned int32’s
vector3i, vector3u, vectori, vectoru signed/unsigned int32 vector variants
vectorh, vector2h, vector3h… Half-precision float vector variants
vectord, vector2d, vector3d Double-precision
vectore, vector2e, vector3e Extended precision
vectorq, vector2q, vector3q Quadruple-precision
matrixnxm n x m matrix of float-32’s

That said, the template-style format allows much clearer definitions, and you could always alias them to shorter names:

Name Description
vector<type,indices> Declares a vector type
matrix<type,width,height> Declares a matrix type

I think we need the latter, but we’ll define aliases for the others. (At least this way, we can easily extend the types supported for each one).

We may need ways of handling unused components (initialize to 1.0, 0.0) – this mainly comes into play when we promote vector2 and vector3 to vector4, and is needed for operations such as 3-component dot products on hardware which always performs them on all 4 components. We also need to differentiate between contiguous vector3’s in memory (unaligned), and padded vector3’s (aligned).

We can synthesize the right behavior by forcing the type (when written to memory) to be aligned via the align keyword. But this makes for a lot more typing than is idea. It might be better to use vector4, and make one of the rules that when you load a vector3 from memory, and then cast/promote it to a vector4, the uninitialized component is always initialized to zero.

The other alternative here is to use a mapping to do the dirty work.

Boolean Types

This is an area that I’m in two minds about. I would like to see the default be that bool is either packed for cache/storage efficiency (so, 1-bit), or to be passed back in a register-width type. These do have slightly different usages though, so let’s define a new type – flag – which is a single bit value, and keep bool for the other one. We’ll let packing rules handle the rest – so if you put multiple flag values side-by-side, you’ll get all the benefits of using bit-fields with less of the slightly messy syntax. The compile packs them appropriately based on the alignment requirements of everything around them, from LSB to MSB.

Name Description
flag Single bit. (Equivalent of bool x : 1 in C/C++)
bool8/bool16/bool32/etc Explicit-width bool (rarely used, but we can use the bool semantics here)
boolr Register-width bool (do we need this? or is this just bool?)

As with C++, true and false are literal values.

Maybe Worth Including: Comparison Types

When performing comparisons, you typically end up with three values; equality, less-than and greater-than – which map to 0, less than 0, and greater than 0.

We could spell this out if we wanted to, which would allow us to possibly optimize comparisons. It really depends on how good the compiler is at hoisting them out and recognizing that it can perform one comparison and branch effectively based on it. (Possibly combining those operations).

This is somewhat of a micro-optimization, but look at this code:

__declspec(noinline) int Select(int a, int b, int lt, int eq, int gt)
{
    if (a < b)
    {
        return lt;
    }
    else if (a == b)
    {
        return eq;
    }
    else //if (a > b)
    {
        return gt;
    }
}

int _tmain(int argc, _TCHAR* argv[])
{
    Select(argc, (int)argv[0], argc, argc-1, argc+1); // Trick the optimizer a little
    return 0;
}

Seems pretty simple, right?

Except it generates this assembly:

_TEXT    SEGMENT
_lt$ = 8                        ; size = 4
_eq$ = 12                        ; size = 4
_gt$ = 16                        ; size = 4
?Select@@[email protected] PROC                ; Select, COMDAT
; _a$ = ecx
; _b$ = edx

; 7    : {

    push    ebp
    mov    ebp, esp

; 8    :     if (a < b)

    cmp    ecx, edx
    jge    SHORT [email protected]

; 9    :     {
; 10   :         return lt;

    mov    eax, DWORD PTR _lt$[ebp]

; 13   :     {
; 14   :         return eq;
; 15   :     }
; 16   :     else //if (a > b)
; 17   :     {
; 18   :         return gt;
; 19   :     }
; 20   : }

    pop    ebp
    ret    0
[email protected]:

; 11   :     }
; 12   :     else if (a == b)

    mov    eax, DWORD PTR _gt$[ebp]
    cmp    ecx, edx
    cmove    eax, DWORD PTR _eq$[ebp]

; 13   :     {
; 14   :         return eq;
; 15   :     }
; 16   :     else //if (a > b)
; 17   :     {
; 18   :         return gt;
; 19   :     }
; 20   : }

    pop    ebp
    ret    0
?Select@@[email protected] ENDP                ; Select
_TEXT    ENDS

In the scheme of things, this is a tiny tiny complaint, but we need to perform two cmp’s here. mov doesn’t affect the rFlags, and cmp knows we’re dealing with a constant integer literal here, so somewhere, somehow, the optimizer got wrapped around the axle. (This was generated with VS 2013). What the code should look like is this:

    push   ebp
    mov    ebp, esp
    cmp    ecx, edx
    jge    SHORT [email protected]
    mov    eax, DWORD PTR _lt$[ebp]
    pop    ebp
    ret    0
[email protected]:
    mov    eax, DWORD PTR _gt$[ebp]
    cmove  eax, DWORD PTR _eq$[ebp]
    pop    ebp
    ret    0

(And you could even make the argument that it should really be this…)

    push   ebp
    mov    ebp, esp
    cmp    ecx, edx
    cmova  eax, DWORD PTR _gt$[ebp]
    cmovb  eax, DWORD PTR _lt$[ebp]
    cmove  eax, DWORD PTR _eq$[ebp]
    pop    ebp
    ret    0

… but I’ve not measure the instruction timings, so the previous version may after the cost of the jump is considered vs. the indirect loads, be faster. The big question here is: will the data be available in the cache/registers? If so, it’s probably cheaper to do the cmovXX version. Ah, I know what you’re thinking: what about branch prediction? Let’s assume that because we’re likely to be using this comparison to partition or sort data, it’s likely that branch prediction will be broken as we’re operating on a fundamentally random data stream – a good portion of the time it’s just going to plain guess wrong. It still may be cheaper to do a mov followed by the two cmovs though).

So we could make a case here for breaking out a comparison as its own type, kind of like bool. Or we could improve the optimizer so that it better understands when flags are affected by the result of an operation. (My guess here would be that because of the jump, the compiler considers the flags indeterminate on entry to [email protected], thus the repeated cmp).

Not In The Language: Trinary Bools

I’m tempted to include trinary booleans for completeness (yes/no/maybe). But to be honest they’re rare enough that you rarely need them. If you need to include them, the math is pretty simple. If you want to OR a series of bools to get a trinary boolean value, you just start at zero and increment for each bool that is true. If the value you get at the end is the same as the number of bools you’ve checked, it’s true. If the value is zero, it’s false. Any other value? It’s maybe. (Or a mixture/indefinite).

The thing is, I’ve only really ever seen them used for UI in tri-state checkboxes. And it’s easy to implement the logic yourself (see previous paragraph). So we’ll leave it out.

Not In The Language: Complex Numbers (but in the library)

I’d like Complex numbers to be covered, but for performance reasons, I think they need to be defined outside of the native language types, and relegated to library code. The reason for this is that it’s likely to be necessary to define complex numbers either as simple vector2 tuples, or as striped structs. (I’ve seen enough FFT implementations to know that keeping the real and imaginary parts separate in memory affords performance gains in some cases).

We’ll reserve the name complex for it though, and require that every implementation include it.

Strings

Strings are implemented like OLE/Pascal strings (with a length prefix, and a terminating nul character). By default, they are stored in UTF-16 format with 16-bit characters, and the type name is string.

stringutf8 is also supplied by the language for scenarios that need it. It has 8-bit characters, and the data is stored as utf8.

I’m in two minds as to whether string should have a 16-bit length, or a 32-bit length parameter. 32-bit seems overkill (if your string is > 64KiB, maybe you should be using a different data type). I’m almost tempted to allow variable sizes for the length parameter, but that seems like a mess waiting to happen.

Type Traits

Types can have traits. They’re just constant literals embedded in the type’s namespace, and are standardized by the compiler. So uint8.max is 255, for example. (Or is it uint8::max? I’ve grown to hate the scope-resolution operator in C++. Maybe we’ll get rid of it in a later article).

Rather than spelling them out, let’s just say for now that all of the type traits found in the C++ numeric_limits.h file are allowed, but instead of needing to include that file, they just get put underneath the type’s name and are provided by the compiler.

(We might also want to come up with more explicit definitions of some of the values – for example, max and min are horribly overloaded and have radically different meanings for reals vs. integer types).

What’s not covered here?

Pointers, pointer types (e.g. void) – let’s just say they work a lot like C/C++. Other literals (such as null – no need for nullptr; we don’t need to work around previous definitions, so we’ll use null instead).

Decisions, decisions…

One thing that we need to figure out with the types is are multiple names for the same type a good idea when they’re just aliases? The purist in me says no, get rid of the cruft. People can always define their own aliases. The realist in me says that they’re fine, but you might have gone overboard a bit. I’ll let this sit for a while and we can figure it out later.

This article is part of a series. The previous part is here.

About Simon Cooke

Simon Cooke is a video game developer, ex-freelance journalist, screenwriter, film-maker and all-round good egg in Seattle, WA. The views posted on this blog are his and his alone, and have no relation to anything he's working on, his employer, or anything else and are not an official statement of any kind.
facebook comments