A couple of friends pointed out that contracts really need to support true duck-typing to be useful in all circumstances. (Not everyone cares all the time about ultra-high performance code). Yet I still want to make sure that they’re high performance if you want to pass them around as types. So let’s see if we can find a happy medium.
The criteria I’d like to satisfy are:
- If you pass something around as a contract, cheap operations are seamless (you don’t have to think about them), but expensive operations should raise a red flag. (This is something editors could help with, but it’d be nice if it was visible at a glance for raw text).
- True duck-typing (a method takes a parameter which is a contract type, and the method thinks that it’s a true type) is possible, and relatively simple to do.
- If you really need an efficient version of the code, you can generate one. So no dispatching/looking up offsets if you don’t need to; there’s a compile-time mapping you can employ.I think this is doable, through a combination of things.
Firstly, when you define a contract, an implicit data-only struct is defined. (It has to be data-only, as any methods defined in the contract can’t be set up properly). This will also be a bit messy if C#-style properties come into play, and kind of limits things a bit, but it’s needed for value semantics instead of reference semantics. I’m inclined to let it lie exactly as it is; if you’re using a contract, you’re dealing with explicit fields only – pointers and methods go by the wayside. They can always be added by some other mechanism later, and it’s a useful construct even without them.
Secondly, if you’re passing in a variable as a reference (or pointer) which uses a contract, we also pass in a hidden pointer which is a table of offsets. The first variable in the contract isn’t in this table. (We adjust the pointer to the variable we’re passing around to the first parameter in the contract – although for performance and cache coherency, we might define that table on the fly depending on usage – more later). The second and following variables are all defined as offsets from that pointer (signed integer, so they can be behind it as well, but again, for cache-coherency we might mess with with this).
At link time, when this table is generated, we might even statically verify the largest offset needed for all types used via the contract – in which case, they could reduce down to even single-byte offsets. (The worst case should be no more than a 32-bit offset – 4GB of memory should be enough for any single structure).
(The observant reader might notice that this is very similar to the way striped pointers work. Score! There might be a way to reduce the effort required to implement this here).
Thirdly, if you really really need the most efficiency, you should be able to do it via templates. To avoid nasty, hard to remember (hey, I hate writing template specializations; I have to look up the mechanism every time I want to use one) syntax, it’s very tempting to do this by just saying that if you specify template ContractName value as a parameter, it’ll templatize the method for the contract to type mapping that’s actually being used. But to be honest, we could just get away with using a function template as-is here. We will need to limit the types allowed to those that honor the contract though, so we can do that.
A bit of contract syntax
I’m not sure if I’ve spelled it out entirely yet, but the whole loose/contiguous/strict/exact things are overridable modifiers to the contract specification. So if you define a contract(loose) ContractName, then where it’s used, you can always override that by saying (strict)ContractName where you’re using it. (This applies to most of the modifiers you’ll come across in this language; for example, if you want to take a struct that’s defined as little-endian and make it big-endian, you can just stick (bigendian) to the left of the typename, and it’ll be the big-endian version that you get. (Warning: This may need to be modified to play well with casting).
Because contracts are named, they get entered into the symbol table like anything else, and aren’t allowed to collide with other names when defined. This also means that you only need to use the contract keyword when you’re defining a new one.
If you want a template type parameter to only accept types that honor a contract, you would declare it like this:
template< typename T : honors ContractName>
If you want to use the contract’s implicit, compiler-generated data-only struct, you can use it in your code like any other struct:
… although in general, you might want to be careful here. It’s likely to be a bad idea to do this; it’s only included for edge cases (functions called by functions which return data to a reference to a type which honors the contract, for example). And it’ll make the syntax for duck-typing variables a bit more complicated than it needs to be.
So we have an ABI boundary where you want to use the struct version that’s different to the one for where the contract should be used.
Maybe if you want to explicitly create the struct version we should spell it out:
That feels a bit better. I’m still wary about the collision with cast syntax though. It’s a bit redundant though; but at least it lets you explicitly pass the implicit struct for MyContract around by reference rather than forcing it through the duck-typing translation table if you absolutely need it.
Parameter passing by contract – the cheap version
If you pass a parameter into a method by contract, the contract is exact and it’s being passed as a pointer, or a reference, then the compiler just passed the pointer around.
Even better, if the loose, contiguous or strict version of the contract mapping just so happens to be identical for this type to what would be needed for exact, the compiler will do the same.
(This is where it gets a bit dangerous – we only guarantee the performance for exact contract mappings, but for the others you might get better perf than you expect. Anything that’s not explicit in the code is a potential source of bad practices, but in this case, because the hidden behavior might be unexpectedly better performance, I’m okay with it).
Parameter passing by contract – the expensive (in terms of runtime performance) version
Against, we’re dealing with pointers, const references and references here – not by value. But we need a mapping between the actual type (known at compile time from the calling function) to the duck-type that the function understands.
So alongside the pointer that the compiler passes in, we’ll pass a pointer to a table of offsets to the variables (in the order they’re declared in the contract). The base pointer is a pointer to the first matching member in the contract.
If we want to get really cheap, we can generate a different table depending on the usage in the function. This gets messier if that function calls other functions, but worst case we end up with the full contract being specified as offsets.
We can also do some analysis across the types this is used with, and change the width of the offset values as needed to reduce the footprint of the table.
The pointer to the table is passed as the next parameter after the pointer to the base.
Template specialization when passing by contract – the expensive (in terms of compile time/I$ footprint) version
As mentioned above, we might as well re-use the template mechanism here. It has all the usual costs of templates, except we restrict the allowed types to those which honor the contract. Example:
template< typename T : honors MyContract> void MyFunction( T& parameter )
… do something to T
This behaves exactly like the expensive duck-typing version, but without the table-lookup overhead on each parameter, because it’s cheating, and generating code explicitly for it.
Can contracts inherit from one another?
Sure, why not for completeness. It’s treated as simple concatenation though, and I don’t think we support multiple inheritance – it’s too messy. (Except for the loose case, which means you could end up with a lot of cases where it just plain wouldn’t work… That feels wrong to me. Better to just prohibit multiple inheritance).
Edit 1: Clarified that it’s a pointer to the offset table for on-the-fly duck-typing, not passing the table itself.