C!! — Handling Endianness in the C bang bang programming language

Posted on Sunday, October 12, 2014 by Simon Cooke

One of the more annoying things that you have to deal with when writing code that targets a variety of different platforms is endianness. You see, like in Gulliver’s Travels, there’s two camps in the computing world – those who break numbers up into clusters of 8-bits, starting at the least-significant part of the number and laying it out a byte at a time that way in memory (little endian) and those who start at the most significant 8-bits, and lay them out a byte at a time in memory until they get to the smallest part (big endian).

(Actually most chips these days seem to let you strap a pin to signal or ground to flip the default order, or set a bit in the BIOS to change it to whichever order you want, but most people use the default).

This is really annoying if you’re a game developer. PCs are little-endian. Old-school Macs? Xbox 360? Wii? PS3? Yeah, they’re big-endian. Network developers too; port numbers are always big-endian. And you need to remember that, or you’ll have problems.

This means that you end up littering your PC-based editor code (your development environment) with code to swap the byte order during serialization – which is harder to read than it should be. And if you forget, the bugs can be difficult to find. And if you take stock code written for one platform and move it to another, you often have to go in and change all kinds of things.

Let’s fix that.

Given that we can’t get everyone to just stop messing around and agree on doing it one way, endianness should be a type modifier. And we’ll need several.

littleendian	This type is always in little-endian order.
bigendian	This type is always in big-endian order.
platformendian(target)	No endianness transformation should be applied (use the current compile target platform endianness always).
nothing	Use the endianness of the platform the code is being compiled for, or allow endianness to be applied to this type as a transform.

This lets us in the simplest case take code like this in C/C++:

struct sockaddr_in {
short sin_family;
u_short sin_port;
in_addr sin_addr;
char sin_zero[8];
};

...

saServer.sin_family = AF_INET;
saServer.sin_addr.s_addr = inet_addr(localIP);
saServer.sin_port = htons(5150);

… and turn it into something like this in C!! (I’ve transformed all the other types too):

struct sockaddr_in {
ProtocolFamily sin_family;
bigendian uint16 sin_port;
in_addr sin_addr;
uint8 sin_zero[8];
};

...

saServer.sin_family = ProtocolFamily::IPv4;
saServer.sin_addr.s_addr = inet_addr(localIP);
saServer.sin_port = 5150;

(I cheated a bit here; ProtocolFamily is basically an enum class : short in C++. In C!!, we simplify that because old-school enums go away, and we only have enum class – more on that in a future post. We could also clean up in_addr if we wanted to, because we have contracts, so we can hide implementation details from client code more easily simply by using an exact contract).

So this is again a contrived example. It’s not going to cause much heartache for anyone. But hey, one of the most common bugs I’ve seen in network programming (unless you do it with any frequency) is remembering to do that endian-swap for the wire protocol.

In this example, the rules are simple: assigning from any type to the same type with an endianness specifier that is different, causes an implicit endian-swapping of the bytes in the value as part of the assignment. (It’s tempting to require a cast here to make it explicit; I could go either way on this for endianness).

You can apply the endianness modifier to types and structs at the time of definition. (The bigendian uint16 above for example).

Endianness cascading

Endianness cascades. (Kind of like const). So when you apply it to a struct, unless you have an explicit endianness specification on a member, endianness is applied to everything in it. For example:

littleendian struct C
{
int32 g;
bigendian int32 h;
platformendian() int32 i;
}

struct B
{
int d;
int e;
int f;
}

struct A
{
bigendian int16 a;
B b;
C c;
littleendian int16 d;
}

This gives us a layout in memory like this:

Endianness	Type	Name
big	int16	A::a
default	int	A::b::d
default	int	A::b::e
default	int	A::b::f
little	int32	A::c::g
big	int32	A::c::h
platformendian()	int32	A::c::i
little	int16	A::d

So neutral’s a bit weird. What’s that doing in there?

It forces that type to always have the target platform endianness, regardless of endianness being applied to it via cascading modifiers or casts.

There’s also something else missing from the endianness table above – because it’s what you get when you don’t apply a modifier. That’s “default endian”, which is the endianness of the platform you’re compiling for. It’s also a key player in the endianness cascade mechanism; if you don’t specify endianness, it’s open to modification.

Let’s say we create a new struct A, that we force to be bigendian. We can do that in the definition or when the variable is declared:

bigendian struct A
{
bigendian int16 a;
B b;
C c;
littleendian int16 d;
}

… or …

void Something()
{
bigendian struct A myBigendianA;
}

Then the layout for A in this case is:

Endianness	Type	Name
big	int16	A::a
big	int	A::b::d
big	int	A::b::e
big	int	A::b::f
little	int32	A::c::g
big	int32	A::c::h
platformendian()	int32	A::c::i
little	int16	A::d

It pretty much just flood-fills any slots in the layout which are using the default endianness.

Endianness and constant literals

Constants will get covered in a future post, but they’re intended to be more dynamic than C++/C constants. If you specify a constant and assign it to a variable of a different endianness, it’ll automatically get swapped at compile-time if necessary.

Endianness and assignment

As mentioned above, endianness swaps occur at runtime via assignments. I’m tempted to force them to be used with the transformation assignment operator (as they have an associated cost) or always via a cast (again, to make the cost explicit), but I’m open to debate here. Besides, transformation assignment feels like something you deal with when you’re usually assigning multiple things, so it doesn’t quite fit.

I think that a cast is necessary; that way we can make endianness part of strict typing for method signatures.

Endianness and type

Endianness is always part of a type. Types have to match unless you modify them by casting. So if a method takes a bigendian value as a parameter, you can’t provide a different endian type and have it compile as-is.

I’m not sure how endianness affects methods on structs. I feel like methods and functions always need to be explicitly endian because it’s a storage thing, and you shouldn’t be arbitrarily endian-swapping every function in your code – only where it’s needed on assigment. Also, arbitrary swapping causes havok for writing code that has to do anything, because the target platform has its own endianness.

So I’m tempted to make it so that methods don’t participate in endianness cascades. Besides, it’s going to be hard for syntax to define a method that can change around endianness without messing with the return type, without resorting to kludgy function-pointer-like syntax.

Endiannness and platform()

The platform keyword I mentioned in a previous post can be used to apply endianness as one of its modifications to a type. Each platform has a defined endianness. (Think of platform as a grab-bag of compiler-defined modifiers, traits and constants that can be applied to any other type). We’ll go over it again when I get to fully describing the behavior of the platform modifier in a future post.

platformendian is still something I’m figuring out. I think the keyword used alone should be “target platform”, and with a named platform (e.g. platformendian(Xbox360)), it uses the platform endianness value defined for that platform. More on that when I handle platform properly.

Edit: 10/15/2014:

Other kinds of Endianness...

There's another way of expressing endianness, which some CPU manufacturers use (thanks, Dushan Hanisko, for pointing this out). That is, they're atomic on word-lengths instead of byte-by-byte when re-ordering memory operations for endianness. So within that block of data, they're ordered with MSB on the left, from right to left (as with byte-atomic endianness), but those blocks can be of varying size - e.g. 16 bits instead of 8 bits, and it's those blocks that we re-order.

To handle this case, we'll need to add a width-parameter to the big and little endianness specifiers. Rather than have it be part of the keyword (e.g. bigendian16), we'll put it as a parameter in parentheses after the endianness keyword (that is, bigendian(16) ). By default, it's always 8, and you can leave it off entirely if that's what you mean - because that's the ultra-common case.

Edit: 10/19/2014: Fixed a typo, and clarified a point. (Meant PS3, wrote PS4. And I should say that it’s the old Macs that are big-endian).

This article is part of a series. The previous post is here.

Simon Cooke is an occasional video game developer, ex-freelance journalist, screenwriter, film-maker, musician, and software engineer in Seattle, WA.

The views posted on this blog are his and his alone, and have no relation to anything he's working on, his employer, or anything else and are not an official statement of any kind by them (and barely even one by him most of the time).