The Actor model of concurrency readily lends itself for use in the FIEA game engine, which is already set up to be an event-driven system composed of entities (which resemble actors). This prepares the engine to exploit future massively parallel hardware platforms and avoids some of the worst problems of concurrency -- synchronization and deadlock -- and minimizes contention. The changes required do not affect the configuration language -- just isolated parts of the back-end and some environment access policies. But the asynchronous communication scheme required by a strict Actor model implementation can incur excessive performance hits for reply-request transactions so we probably have to make an exception for those transactions.
INTRODUCTION
In this series of articles, we review operating systems concepts in the context of making a game engine. Operating system concepts include process, memory and device management. Process management entails processes, threads, scheduling and synchronization. This article visits the topic of process management, and focuses on the Actor model of concurrency as it relates to game engine architecture.
In the Actor model, an actor is a computational entity which sends and receives messages, makes decisions and creates other actors. This bears some resemblance to entities in the FIEA game engine, which send and handle events, receive a share of CPU time each frame, and perform arbitrary processing.
Though some cursory attempts have been made to thread it, the FIEA game engine lacks an inherent concurrency infrastructure. Conventional wisdom suggests that computers of the future will have many more processing cores operating at about the same speed they do now. We therefore explore the possibility of formally adopting the Actor model of concurrency as an integral part of the FIEA game engine and its configuration language.
Related Articles
Ruben Vermeersch describes how using the Actor model is easier than threads.
The blog "Valued Lessons" has an article about message passing concurrency in Python.
James Leigh describes Utilizing a Multi-Core System with the Actor Model.
The SALSA Tutorial provides a description of the Actor model which focuses on practical use.
Gernot Heiser discusses the microkernel performance debate, which relates to this discussion in that microkernels are essentially message passing systems in an interprocess communication system.
THE CASE FOR ADOPTING THE ACTOR MODEL
The FIEA game engine processing model happens to resemble the Actor model - except that the engine is not (in its usual form) concurrent; although all the entities operate almost as though in an extremely simplified cooperative multitasking system with rudimentary time slicing, they do not actually operate in separate threads (though students have introduced actual multi-threading with some consideration for synchronization). With other changes afoot regarding how entities access their environment (with more strongly enforced scoping), switching to the Actor model might be easy.
But why bother? What would it buy us? And what would it cost?
Avoiding Deadlocks
The question should not be whether to support concurrency, but how. The fundamental problems boil down to deciding which processes to run on what threads, and how those threads should communicate (and synchronize). An obvious and common choice entails simply creating threads and running processes on them, communicating via shared memory which is synchronized via locks. This scheme, however common, invites dreaded deadlocks and contention issues. So although we definitely want concurrency, we would like a model free of deadlocks.
Recall the Coffman recipe for deadlocks:
- The system must synchronize via mutex locks or their equivalent. This is unavoidable at some level but we can limit the number of locks.
- Processes already holding a resource lock can request another lock, i.e. multiple locks are involved.
- A process holding a lock cannot be forced to release that lock.
- Multiple processes must attempt to obtain the same resource.
I boil this down to the following: Multiple processes must attempt to obtain multiple locks in different orders. You can avoid deadlock by strictly requiring that the order in which processes obtain locks is globally uniform. That is guaranteed to eliminate deadlocks but it poses some problems, such as being aware at all that this situation can arise; in large code-bases with multiple programmers, that is hard. Furthermore, obtaining locks in a strict order implies that in many situations, a process will obtain locks that it usually does not require, on the off chance that it might require it while holding another lock; this creates otherwise avoidable contention.
The Actor model breaks this chain by disrupting the hold-and-wait condition; actors never hold multiple locks because all of their communication with other actors (which is the only communication the model supports) happens via asynchronous message passing. From the perspective of the Actor, it never holds any locks, ever. So, no deadlocks.
Adopting a strict Actor model so implies that theoretical results written about that model apply to the engine. Although so far, I am not personally aware of thunderous earth-jiggling benefits of that fact, research into the Actor model is ongoing and perhaps somebody will conclude, using the Actor model solves Important Problems.
Each actor runs in its own thread and, as far as other actors are concerned, that could be on another machine on another planet. The language Erlang exploits this to support hot swapping code. As Vermeersch states, "To allow for non-stop application, Erlang support hot-swapping of code. This means that it is possible to replace code while executing a program, making it possible to do upgrades and maintenance without interrupting the running system." Imagine never again having to log off of WoW for a client or server upgrade. That's freaking cool.
Alright, so the Actor model looks attractive. How would we implement that in the FIEA game engine? And if the Actor model has a feature or limitation we do not like, what are the ramifications of departing from this model?Buffering Messages
The Actor model specifically avoids posing requirements on how messages are ordered. In this regard, processes in such a model act like remote peers in a packet-switched network using UDP -- the deliver order is not guaranteed. The upside is that if a simulation based on the Actor-model actually uses a protocol upon UDP, then the network code will not have to spend extra effort ordering packets.
The downside is that sometimes you want to guarantee delivery order. Some systems (e.g. Scala) provide mechanisms for synchronous delivery (which can be used to guarantee delivery order), but the synchrony has the potential to violate the terms of our "no deadlocks" coupon.
The ordering could happen inside the "mailbox" implicitly associated with each actor. Messages could include ordering information. The actor could simply allow messages to queue up until a contiguous sequence of in-order arrives, at which point the actor can process them. This would not violate asynchronous delivery. It adds complication to the message processing subsystem but let us consider the problem effectively solved (in theory), should the need arise.
Often message-passing has the form of request-reply, where an entity simply wants information from another entity. In the strict Actor model, this transaction would require a pair of asynchronous messages, which can introduce undue latencies and inefficiencies.
Recall that part 1 described how asynchronous message passing introduces computational inefficiency; there are twice as many synchronizations happening (each of which entails an expensive context switch) and the message data has to be copied both into and out of the message buffer. In contrast, a synchronous request has only a single synchronization primitive and can avoid copying message data at all, if the replier processes the message immediately (which is likely). L4 exploits this to make great performance improvements over other microkernel designs (like Mach) which use asynchronous message passing.
Request-reply is a subset of the synchronous communication problem, in that we can make special exceptions to how the request is posed. We have several options:
- Implement the policy that if a request is made, and the queried entity is occupied at the time of request, that the request simply fails, and that code must be written to handle that. This seems complicated and awkward. Worse, it could create livelock which is even worse that deadlock.
- Rephrase request-reply cases so that any entity that might need information from another simply subscribes to that information and automatically receives it when it changes (i.e. uses observer a.k.a. publish-subscribe design pattern). This avoids latency at the risk of some possibly avoidable overhead, if the information is not needed each time it changes. This also avoids needing to treat the situation as a special case; the system described so far can already do this. Let's keep this in our back pocket.
- Partition entities to run in the same thread, in which case contention would be impossible. This has the potential to have the least overhead since it avoids the need for a lock, but also increases the granularity of concurrency.
- Allow synchronous communication only for request-reply transactions. This increases the chances of contention, though, since the requester has to wait on the replier, which in turn could be waiting on another replier. This seems like a recipe for deadlock: A replier could block while waiting on a reply from its requester (or in a transitive chain of requesters). (But L4 implements synchronous message passing, how does it avoid deadlock? By the intelligence and care of its authors.)
In practice, we can apply a combination of the above techniques, depending on the situation. Synchronous communication is the most efficient but it re-introduces the possibility of deadlock, so it misses one of the major advantages of the Actor model. Aside from the infamously poor performance, old microkernel systems are also notorious for deadlocking. Since this article argues in favor of adopting the Actor model based in part on the avoidance of deadlocks, it seems inappropriate to simply toss that feature out for the sake of performance. But performance is critical so we have a real dilemma here. It's an open issue and I welcome discussion.
Locality: Who can send messages to whom.
In the Actor model, actors do not share any data whatsoever. Only messages can route data between Actors. That implies no access to global variables or static methods in the traditional sense. I believe that would not impose undue issues, if pressed, but for the sake of efficiency I believe we could make exceptions for read-only data such as controller input and the current time.
HOW TO ADOPT THE ACTOR MODEL
The FIEA game engine can implement the Actor model with minimal change to the configuration language. Messages would be passed via the event system (as is done now). A restriction in scope resolution syntax and/or rules would disallow accesses to non-local data. Then behind the scenes we serialize the event system and can run each Actor in its own thread. Magic.
Heck, for the sake of debugging we could even randomly reorder events in the queue -- to test the robustness of message handling out of order.
A few other details. In "traditional" terms, each actor runs in its own thread and has only 1 lock -- a lock on itself. Effectively, every "reaction" in the actor is a critical section. Actors only process requests; their "process" is just a sequence of event handlers. That means that even the "update" loop itself would have to be written as a response to an "update" event, which gets thrown periodically. In other words, it would NOT suffice to consider each entity as a process that runs the "simulation" process all of the time, and handles messages on-demand. The "simulation" process would be just another event handler like any other. No big deal.
Also, rendering would require some intermediary since it involves sending data (a message) to yet another actor -- a special one which has a monopoly on the GPU. That's not trivial, but it also aligns with how rendering happens on multi-threaded game engines anyway. In fact, usually the rendering of an actor happens in a monolithic process (which we can model as an actor) which serializes all rendering. Essentially, rendering consists of populating vertex buffers and issuing draw calls. Populating vertex buffers can happen anywhere. Only the draw calls (and the associated render state setup) has to happen serially. So it seems prudent to isolate those two processes. And indeed the FIEA game engine already does that. It currently, however, happens to issue the draw calls as though it happens from the entity itself. That would have to change to a scheme with a single entity which mediates all draw call submission. The details of that subsystem constitute a topic for another post.
CONCLUSION
The benefits of the Actor model outweigh the costs so we could adopt it. The Actor model provides concurrency free of deadlocks and prepares the engine for the promised massively parallel game consoles of the future. The processing management subsystem of the FIEA game engine lends itself to using the Actor model with minimal changes to the configuration language.
This is just a start. We still have to consider process migration. For architectures like the PS3 and its weird little SPU homonculi, process migration is not just a whiz-bang nice-to-have, something akin to it is required. But it seems like such changes can happen behind the scenes and, once again, the configuration language would not have to change to support it.
PENDING ISSUE
It seems that we cannot have both the guarantee of a deadlock-free IPC system and also have efficient request-reply transactions, so this issue remains open for debate. I have proposed a few options for request-reply. Perhaps it would be appropriate to try some or all of them and see what works best. In any case, it makes for an excellent discussion topic. How should it work?
Post your comments!

1 comments:
First!
Post a Comment