In previous posts, I mentioned binding schemes and advocated using dynamic binding. I hereby retract that statement (at least partially) and revisit the topic in this article.
Consider how object-oriented languages express the association between data and its procedures, and how to invoke those procedures on an object: Make procedures "members" of the class (just as it has member variables). Member functions (a.k.a. methods) that refer to member variables do so implicitly by dereferencing a pointer to the current object. Inside the method definition, C++ expresses that as "this->mVar" where "this" is an implicit pointer variable that contains the address of the object.
Callers invoke methods using the syntax "object.Method()", where the address of "object" gets bound to "this" upon invocation. C++ takes this one step further and allows programmers to omit the "this->" portion of the syntax so that it appears as though the object itself is inside the scope of the function definition, and that the member variables are dynamically bound at runtime. (In fact, in a previous post I pointed out the similarity between this sort of binding and dynamic binding.) Object-oriented parlance calls this invocation technique "message passing" -- the caller passes the Method message to the object which receives and handles that message.
In a later article, I will discuss message passing but for now let's take a detour into name binding, because we need to sort that out before dealing with the reset of message passing semantics.
Background
Last time I covered the data type system in the game engine that FIEA graduate students write each year in the programming 2 class, including the primitive types (int, float, string, reference, pointer, etc.) and the means of combining them (tables with name-value pairs). I mentioned notions of "functions" and also tossed around the phrase "means of abstraction" (which refers to how we can extend our language) and now return to elaborate on those topics.
But before diving into those topics, I need to review a few other notions, and partially retract some of what I wrote last time, specifically the proposal that we use dynamic binding for function calls.
[Retraction: I stated in the previous post that functions do not need to be their own type. While that is true, the dynamic resolution needed to differentiate between tables and tables-that-are-functions incurs avoidable runtime overhead. Furthermore, the fact that tables-that-are-functions could be distinguished from tables-that-are-not-functions does not, as far as I can tell, improve the language in any way. So functions should be their own type. That whole avenue was bone-headed and I regret having posted it.]
We need to understand the notion of "binding" -- associating a value with a name -- and the flavors of binding.
BindingIn a previous post I exposed the difference between static and dynamic binding, but I will briefly review those concepts, and clarify their differences and how they interact with scoping.
Binding refers to the process of associating a value with a name, i.e. associating an address with a symbol.
Binding has at least two aspects: space and time.
The time aspect of binding has two flavors:
- Static Binding (a.k.a. early binding) resolves which value goes with a name at compile time.
- Dynamic Binding resolves names at runtime.
- Lexical scope binds names to the symbols present in the top of the scope at compile time.
- Dynamic scope binds names to the symbols present in the top of the scope at run time.
The following example demonstrates the difference between lexical and dynamic scope. Consider the following code (written in a made-up language):
procedure int procedure(int) MakeProc( int y )
{
return procedure int lambda( int x )
{
return x + y ; // What is the value of y?
}
}
procedure Caller()
{
int y = 5 ;
procedure newProc = MakeProc( 7 ) ;
int z = newProc( 3 ) ; // What is the value of z?
}
The value of "z" in the example above depends on whether the language uses lexical or dynamic scope. If it uses lexical scope and closures, the answer is that y=7 inside lambda (because y comes from the value passed into MakeProc) and therefore z=7+3=10. If it uses dynamic scoping, the answer is that y=5 inside lambda (because y comes from the calling environment, i.e. Caller) and therefore z=5+3=8. In the dynamic scoping example, the value passed into MakeProc is not used.
This example reveals two aspects of combining dynamic scoping with dynamic binding which are bad: It violates the principle of least surprise (since it seems like y should be bound in the callee environment but is bound in the caller environment) and it means that the lookup for y occurs upon each invocation of newProc instead of once (at compile time).
Tricks that make static binding look dynamic
True dynamic binding implies performing symbol table lookups at runtime, which we want to avoid, because it is expensive.
In some situations, however, we want the appearance of dynamic binding. For example, we want "this" bound to the object receiving the message. And when the method is virtual we actually want to bind the method itself based on the type of the object, which gives the appearance of dynamic binding for the function -- not just its data.
But these forms of late binding are not truly dynamic in the sense that the names are resolved at runtime. In fact, the names are resolved at compile time, using a combination of tricks including offsets and pointer variables. So although the C++ style of quasi-dynamic binding increases the level of indirection, the binding occurs at compile-time. Some languages have true dynamic binding and perhaps that gives them an expressive power lacking in statically bound languages, but those languages elect to pay for that expressivity with reduced performance.
Though dynamic binding might give the language some elegance (because such languages can avoid trickery like using offsets and increased indirection), a game engine requires higher performance, so we cannot afford dynamic binding.
Let's take a look at some forms of quasi-dynamic binding and how compilers achieve the illusion of dynamic binding with the performance of static binding.
Binding local variables
Activation records include space for local variables. (Usually that record exists on a call stack.) All references to those local variables occur via indirect addressing, using indexes relative to the address of the top of the activation record. In other words, this uses a pointer and an offset. The offsets are effectively "baked" into the structure of the activation record, i.e. fixed at compile-time.
(In optimized builds for modern machines, local variables are likely to be in registers, but let us ignore this complication because it does not apply to virtualized languages such as our configuration language.)
Binding actual arguments to formal arguments
Actual arguments closely resemble local variables except that the caller provides initial values for those variables. But the code that references them also uses the same trick of using an address (stack pointer) and an offset from that address.
Binding member variables within methods
Member variables are accessed using an offset relative to the address of the object accessed. So once again, binding occurs using an address and an offset, but this time the address is the top of the object, not the activation record.
Binding virtual methods
Virtual methods are dereferenced by first accessing an offset from the address of the top of the object (i.e. just like a member variable) which is itself a pointer to a function table. Each method has an offset in that table, so requires yet another address-plus-offset dereference. So it's the same trick applied twice in a row.
Binding "free" variables
Free variables only exist in languages which allow the creation of functions at runtime. In such languages, the created function can refer to variables defined outside the scope of that function. In that case, the function has an associated "closure" which is a data record that includes the values of variables at the time the closure was created. The code inside the function effectively refers to those "free" variables by dereferencing the address of the closure, plus (you guessed it) an offset for the specific variable. So once again, this uses the address-plus-offset trick, with yet another record.
So in each case, statically-bound languages reuse the same trick (address-plus-offset) over and over, except that each case uses a different address:
- Locals and arguments ↔ address of activation record
- Members ↔ address of object
- Virtual methods ↔ address of object, then address of function pointer table
- Free variables ↔ address of closure
Abstracting from this pattern we can say that the compiled code always dereferences some member of some record, where the address of the record resides in a a pointer variable. The offsets are fixed at compile time (hence are statically bound) but the pointer variable is assigned at runtime (hence resembling dynamic binding). So although the cost is not as low as using immediate addressing, a pointer dereference certainly takes less time than a symbol table lookup. Also, if all of the offsets are contiguous then only the first pointer dereference is likely to have a high memory fetch cost; subsequent access to adjacent members are likely to be in cache already.
Tying this to the engine configuration language
The game configuration language will not use the exactly same techniques that a regular compiler uses but it we can perhaps take inspiration from those techniques. Clearly the recurring theme here involves dereferencing an address of a record (which, in our language, we call a "table") and an offset into that table. Our tables can indeed be referenced by offset, so this approach seems promising.
I leave the details for another post but this article clearly implies heading in a particular direction.

No comments:
Post a Comment