Wednesday, September 13, 2006

Your friendly QueueUserAPC

I'm a reasonably active reader of programming.reddit.com and lately there has been a whole ton of articles about concurrent programming, so here's my 2c to throw into the ring. Basically, there seem to be 2 approaches to concurrent programming these days - the classic shared memory/locking model, which the majority of applications these days use, and the 'erlang style' nothing shared/message passing model. They both have their uses - I guess if you were using erlang you'd do that, if you were using C# you would use shared/locking. My job is to program mostly C++ apps on windows, so I'd like to share what I think is a neat trick you can use ( when programming C++ apps on windows ). This may be commonly known by everyone, but I've not seen it mentioned on reddit or anywhere else, so I'm guessing that it's not. In windows NT4 and up, native windows threads have what is called an APC Queue. APC stands for Asynchronous Procedure Call. This is used if you are using any of the Overlapped IO functions ( ReadFileEx and WriteFileEx amongst others ). Basically what it is, is a list of function pointer/ULONG_PTR pairs. So, you ask, what good might that be? Well, this APC Queue gets processed whenever the thread enters what is called an 'Alertable Wait State'. "Alertable Wait State" is a fancy name for "Has called the SleepEx function, WaitForSingleObjectEx function, or one of the many other Wait*Ex functions". So, wherever you'd do a WaitForSomething, you do a WaitForSomethingEx, and windows automagically pulls things off the APC queue and executes them. The QueueUserAPC function allows you to insert your own functions into this Queue. In a nutshell, it says "Execute this callback function in this thread" If you haven't clicked onto why that's so cool, bear with me... Observe the following program. If you're a windows C/C++ coder, it probably looks somewhat familiar #include "windows.h" #include <list> std::list<int> g_listOfInts; DWORD WINAPI ThreadProc( LPVOID param ) { g_listOfInts.push_back( 7 ); } void AddToList( int param ) { g_listOfInts.push_back( param ); } void PrintList() { std::list<int>::iterator iter; for( iter = g_listOfInts.begin(); iter != g_listOfInts.end(); ++iter ) std::cout << *iter << std::endl; } void main() { DWORD dwThreadId; HANDLE hSecondThread = CreateThread( NULL, 0, ThreadProc, 0, &dwThreadId ); AddToList( 5 ); //... do some other stuff PrintList(); } Now, this code of course has a big nasty race condition. As both threads insert into the list, they could both do it at the same time, which would either cause the list to be invalid, the program to crash, or memory corruption, or any number of other bad things. Also, the second thread could modify the list while the iterator is looping over it, which is also problematic. The classic solution is to lock the list. You could use a windows CRITICAL_SECTION, a boost::mutex::scoped_lock, or dozens of other things, which all boil down to "If any other thread wants to look at this object, it must wait for any other threads which might also be lookin at or modifying it. Also, if everything that needs to access the object must wait for everything else, we effectively serialise access to that variable down to 1 thread - if we've got 32 CPU's running 32 threads, 31 of them are going to be waiting on our lock, so we have zero performance improvement over just running a single thread. The 'non-shared' solution would be to somehow enforce that one thread "owns" the list. If any other threads want to get any data from it, they must send a message to the "owner" thread, and it must reply. This is probably trivial in something like erlang, but I don't know erlang, so I can't comment. In Windows/C++, you've got trusty old Windows Messages ( using MSG and PEEKMESSAGE, etc, like you'd have in any windows GUI app ). However, if you were to use this approach for any nontrivial program you'd end up creating hundreds of GET_X and GET_Y messages, and it would soon become unmanagable. QueueUserAPC to the rescue! Look at the next program. #include "windows.h" #include <list> std::list<int> g_listOfInts; HANDLE g_terminateSignal; DWORD WINAPI ThreadProc( LPVOID param ) { g_listOfInts.push_back( 7 ); while( WaitForSingleObjectEx( g_terminateSignal, INFINITE, TRUE ) == WAIT_IO_COMPLETION ); //apc's can execute in this loop while we wait for the quit signal. } void CALLBACK ApcAddToList( ULONG_PTR param ) { g_listOfInts.push_back( (int)param ); } void CALLBACK ApcPrintList( ULONG_PTR param ) { std::list<int>::iterator iter; for( iter = g_listOfInts.begin(); iter != g_listOfInts.end(); ++iter ) std::cout << *iter << std::endl; } void main() { g_terminateSignal = CreateEvent( NULL, TRUE, FALSE, NULL ); DWORD dwThreadId; HANDLE hSecondThread = CreateThread( NULL, 0, ThreadProc, 0, &dwThreadId ); QueueUserAPC( ApcAddToList, hSecondThread, 5 ); QueueUserAPC( ApcPrintList, hSecondThread, NULL ); //magically assume we've written some code to wait for a WM_QUIT //and set g_terminateSignal when we get it. } So, what's the difference here (apart from the code looking all werid and different)? Well, we create our second thread, which adds something to the list, like it did last time, but instead of exiting, it does a WaitEx for the terminate signal to be set. It's now in the "Alertable Wait State". Also, our main function doesn't mess with the list any more. The program is written so the second thread "owns" the list. If the main thread wants to modify it, he must make it happen in the second thread. In this example, first ApcAddToList and then ApcPrintList are "Queued" to the thread (which is in it's alertable wait state), where they are executed. Because everything involving the list only ever happens in thread 2, we no longer have our race condition. We don't have to lock at all either, instead of the threads waiting for each other so they can access the locked memory, they are free to carry on doing other stuff while thread 2 does whatever it needs to. Just like as if we'd written it in one of those fancy concurrent no-shared-state languages, but without having to rewrite your entire codebase. Cool, no? PS: If you're thinking "That's cool, but how can it be useful given that the APC callback has to be a free C-style function and only has one 32 bit parameter...", the answer will come soon. PPS: for those of you that can't wait for me to explain how it can be more useful, go and look at boost::bind.

1 comment:

Blah said...

Do you have a better-formatted post than this one? It's a bit hard to follow your post. Thanks!