Friday, November 10, 2006

Why everyone wants to get rid of the parentheses in lisp

This post is pretty much a response to http://eli.thegreenplace.net/2006/11/10/the-parentheses-of-lisp/ It's a good post - if you haven't read it, eli seems to say that: He's noticed a lot of people trying to use whitespace to remove most of the parens in lisp, and can't understand why. His opinion (as it seemed to me) was that removing them would be counterproductive because the parens (and their uniform syntax), which is what makes lisp so much better than everything else).

My 2c:

The case FOR s-expressions

* Uniform syntax is theoretically very appealing (from a purist point of view)

* It lets you write macros. Macros are incredibly powerful and basically awesome. I have macro-envy in most languages most of the time.

The case against s-expressions

* s-expressions are not at all like how I think about things (or how anyone else who is not a die-hard lisper thinks about things), because the lack of syntax is completely at odds with natural language.

To elaborate on that last point: Because english is my native language, that's how I think. A fair amount of the time, code which I consider "good" ends up looking (and structured) like a shorthand version of english, because that's what I find is clear and understandable.

I can write Ruby, C++ and C# and most other blub languages in such a way that maps relatively closely with how I think. In short, they 'fit my brain'. I also realise that over time my brain has adjusted to fit them as well, so I am aware that this kind of thing is probably always going to be biased towards the incumbents.

Conclusion

I believe that with the exception of a few idealists who hold the 'uniform syntax for purity's sake' argument above all else, pretty much everyone else is in it for the macros. To get them, I see two paths -

1) Change yourself: Do enough lisp programming, for a long enough time, and put in the effort to make your thought process (at least with respect to programming) match with s-expressions. This seems to be what all the 'hardcore' lispers have done, and the evidence seems to point towards this having a huge and amazing payoff. I'm nowhere near this goal, but this is eventually what I'm aiming for with my casual lisp programming... It is however, (like any learning) long and dare I say, hard.

2) Try and change lisp: Try and make the syntax fit better with your brain, so you (and all the other regular programmers) don't have to put in the hard yards adjusting quite so much. This is what I believe the lisp-whitespace people are aiming for.

I do find the whitespace-lisp easier to read/understand than normal lisp, and it doesn't seem to sacrifice any functionality (macros still work), so it could be a winner.

Then again, significant whitespace brings in a whole host of other problems, and makes it not so "pure", so perhaps not.

Is it a good idea? In the end, I vote for a definite "maybe" :-)

Tuesday, October 17, 2006

Programming best practices

I was going to call this "X secrets of highly effective developers", like some other people, only these things shouldn't be secrets. Note this is as always my not-so-humble opinion, so it is entirely likely that this article is either a) misguided, b) missing things, or c) entirely wrong, but I can't give you anyone else's opinion now can I.
These are all typical cliché's, but I'd like to try explain them just as a brain dump anyway.

1. Code for people, not computers.

This is really the absolute number one goal. Everything else can be taken as a corollary to this.

If you don't code for people, you are writing un-maintainable code. However, it's easy to throw the term around like a lot of other slogans/buzzwords without actually having a solid understanding. What this means to me is in fact "Try and write your programs as if they were plain english"

Well, you know, not quite english, because english has it's own giant set of problems too, but the point I'm trying to make is that someone else should be able to read your code and it should flow as if it has topics, headings, sentences, paragraphs, and so on. You should read it like a book, not decipher it like a code. In fact, code is a crap word, we should call it something like "instructions", instead of code.

How do we do this? With years of experience, and a constant drive to learn and apply new things.
But for now, here's a couple of pointers which I find have helped me so much that I feel like slapping other developers who don't do them:

2. Use good names

This is obviously a subset of #1, but if I had to pick the most important thing, this would be it. Again, "Use good names" is a meaningless phrase, what you should do is "call things what they are". Whenever you have a variable/function/class/whatever, ask yourself "what actually is it? a buffer? a file handle? a person? what?", and then call the variable that. Simple, but mostly always overlooked by most programmers. Other developers should almost always be able to look at a variable/function/class and make a correct guess as to what it is, and what it might do otherwise you probably have a bad name.

This is doubly useful because sometimes you will have trouble expressing what a something actually is. Sometimes, this is just not being able to think of the right word (go learn english :-D ), but more often than not, it is a strong signal that your design isn't correct, or you have accidentally gone down the twisty path towards a tangled mess of garbage.
For example, if you can't think of a single good name for a class or function, it's probably because it's doing more than one job, and should be broken up into 2 classes/functions.

At the end of the day, if you can't even think of a reasonable name for a thing which makes your program work, then how will anyone else (usually you, 6 months later) ever hope to understand it?

3. Use abstraction

This can be "bottom up" programming, or "top down" or whatever design methodology is in fashion at the time, but the important thing is that you build code On top of other code, not alongside it. You are creating a pyramid, not tiling a floor.
That may not have made too much sense - to try explain it a bit better, think of writing a program that opens a file, writes some string to it, and closes the file.

If you were to use the all-too-common floor tiling method, you'd have a big long function (or lots of small functions running in sequence, whatever), which would do the following in sequence:
Allocate a file handle -> call the API open function -> allocate the memory for the string -> keep track of how many bytes we have -> write the memory to the file using the API file writing function -> close the file -> free the string memory.
All the operations are on the same "level", the stuff is just happening in a big row, like laying down tiles next to each other..

If we are to write the above using abstractions, we'd instead have a file class, and a string class. The file class would deal with the file API (handles and stuff), and the string class would deal with the string memory. Then, instead of our program allocating handles and memory, it can just deal with files and string classes. It would do the following.
Create a file object -> Create a string object -> call file.write( string ) -> cleanup done automatically by objects.

When you write programs correctly with abstraction, you can stack the abstractions on top of each other, leading eventually to code that is almost like pseudocode or a domain-specific-language.
This is what object oriented programming, and a lot of other techniques are actually for (as opposed to the common retarded view of inexperienced programmers, that OO isn't being done correctly unless all your classes use inheritance somehow)

4. Do the simplest thing that can possibly work

Now before I get branded as an agile zealot, not everything from agile is actually bad. What this actually means is Do the right thing, but do it the simplest and smallest way you can. Don't write code which doesn't directly help you get things done, or tries to solve problems you don't actually have.
This also doesn't just apply to your higher-level design, but low level too. Functions/classes/interfaces/etc should all be as simple as possible, and do only what they need to.

A classic example of doing the wrong thing here is building a big pile of classes and interfaces and message-handling code before you actually attack your main problem. Yes it's fun to build frameworks and architecture, but at the end of the day, it'll probably just get in the way.

5. Don't repeat yourself, refactor instead.

If you are like lots of developers I've seen, and believe the best way to start a new program/library/class is to find a similar one and copy/paste it into a new file, STOP NOW. BAD PROGRAMMER! SIT!

If you find yourself writing the exact same code twice, refactor it into a common function or class.

If you find yourself writing similar code twice, refactor the common bits into another function or class (generics, dynamic types or other ways of dodging verbose typing, and first-class functions are a big win here), and have the remaining different bits as small and clear as possible.

5. Good design does not come from 'design,' but from refactoring.

If you do all the other stuff, you'll probably find yourself left with a ton of small functions and classes, and all your other classes will be using them. This is already better than lots of giant functions which duplicate code and functionality, but can get a bit messy provided you don't clean them up.
Most likely however, a bunch of your helper functions will all take similar parameters. These are prime candidates for making a new class. Remember also, not everything should to be a class. It's fine to have a bunch of global functions in a namespace (or a static class if you're stuck in C# or java), if that's the nicest way to think about those kinds of objects. The main goal is to always try and make sure that your helper/library functions are as simple, clean, and useful as possible, and are arranged in clear and obvious groups. You can even write unit tests for them :-)

Sooner or later, you'll probably end up with either some kind of framework (to my knowledge, this is how Rails came about), or a set of general classes, like the .NET base class library, but for dealing with the kinds of problems your company faces, or perhaps both.
This is excellent, as you'll be able to re-use these things over and over again in future, meaning next time you have to write that stupid app which has to read the registry and write to files, you'll be able to use your framework/library code and do it in 5 minutes instead of a day. Your boss will love you, and you won't mind programming in C++ anymore because you can actually get things done now.

Also, because you'll have created this framework/library code based on other working code, and based on what you actually need to do, and refactored it to best fit your problems as you go, you'll have stuff which is actually useful and good.
People are stupid, and 99% of us can't design our way out of a paper bag. Most frameworks that get 'designed' up front wind up completely missing the point and to solving the wrong problems in the wrong way. But, by keeping existing code simple, clean, non-repeating, and constantly refactoring it, we can end up with some well structured and maintainable code anyway. Owzat? :-)

Sunday, October 15, 2006

Diffing itunes music libraries with ironpython

Intro:

Ages ago when I first played with python, I found it was pretty cool, in a weird sort of way, and that I could probably love it once I had spent 6 months bending my brain around it's quirky bits (too many underscores, whitespace importance, inconsistent and strange libraries)... And then soon after, I found ruby, which had none of the above problems, so I didn't bother learning any more python...

Until a few weeks ago, when IronPython was released.

For the uninitiated, IronPython is python which runs on and inside the .NET CLR (or mono, just not as quickly). My biggest blocker for regular python was the built in libraries/docs. I'm bound to be flamed, but the ones I looked at (file access, networking, HTTP, etc) just were not intuitive. IronPython however is a dream, because it uses all the .NET libraries instead (or as well as, if you like). I'm pretty familiar with the .NET BCL, so this was great.

Actual content :-)

During my ~4 years at my current job, I have built up a large collection of music which I'd listen to. I also have a large collection of music at home. As I'm leaving in 3 weeks, and will lose all that data, I wanted to take a copy of the music from my work computer home.

The problem with this is that I only have a 6 gig ipod mini to transfer the songs on, so I can't just copy them all. I needed to diff the 2 music libraries, and only copy the songs that I don't have at home already. iTunes exports a large hairy pile of crap XML file when you ask it to export it's library, so here I thought would be an opportunity to play about with some IronPython, and post it on the net in case it's useful to anyone else.

Here it is, the comments are the documentation :-). Hopefully it's useful, if only as a quick demo of how things work in ironpython.

# Import all the libraries we'll need import clr clr.AddReferenceByPartialName( "System.Xml" ) from System import * from System.IO import * from System.Xml import * # Create a helper function to convert the iTunes XML file into a hash so it's actually useful # An example of one of the hash entries: ret[ 'Disturbed: Prayer' ] = 'file://C:/path/disturbed_prayer.mp3' def fileToHash( fileName ): ret = {} doc = XmlDocument() doc.Load( fileName ) # The XPath is ugly... Export your itunes library and take a look to see why for elem in doc.DocumentElement.SelectNodes( "/plist/dict/dict/dict" ): # song name is always the first <string> song = elem.SelectSingleNode( "string[1]/text()").Value # artist is always the second <string> artist = elem.SelectSingleNode( "string[2]/text()").Value # Unfortunately the filename is not fixed in the structure, so we have to # find <key>Location</key> and then move to the next element after it path = elem.SelectSingleNode("key[text() = 'Location']").NextSibling.FirstChild.Value # Add it to our return-param ret[ "%s: %s" % ( artist, song ) ] = path return ret # Parse both files into seperate hashes homeSongs = fileToHash( "itunes library home.xml" ) workSongs = fileToHash( "itunes library work.xml" ) # Create a new dict containing only songs that are at work but NOT at home diffSongs = dict( [ (song,workSongs[song]) for song in workSongs if not homeSongs.ContainsKey(song) ] ) # Write them all to the output file # The format of the output file is: file://C:/path/disturbed_prayer.mp3 # Disturbed: Prayer # The reason I've done it like this with the path first is hopefully to make it easier # for another script to be able to use it as a list of filenames to copy... writer = StreamWriter( "diffSongs.txt" ) for str in [ "%s # %s" % (diffSongs[song], song) for song in diffSongs ]: writer.WriteLine( str )

Disclaimer:
1) This was meant to be quick to write, I didn't care about run performance or any other nifty tricks - it only takes 3 seconds to parse 2x 3.5 meg XML files, and that's good enough for me.
2) Sorry about no syntax hilighting, I couldn't find any decent way to do it short of screenshotting my PSPad window.

Wednesday, October 11, 2006

Vista RC1 and RC2: Impressions

Over the last couple of days I've installed both vista RC1 and RC2 at home, and today installed vista RC2 at work. Here's a quick brain dump of my thoughts about various things that have cropped up: Except for where I explicitly say so below, RC1 and RC2 are pretty much the same.

Aero Glass (fancy graphics):

My home machine is an Athlon 64 3200+, with 512 RAM, and a Radeon 9700 graphics card w/128Mb. The graphics card is the main thing that gets hit by aero, and in RC1, aero wasn't enabled by default after the install. A quick look at the control panel to turn it on, and it was fine. I really liked it. Sure, it's just eye-candy, but who said computers have to be ugly? All in all I was very happy....

HOWEVER: RC2 decides that I need 1 gig of RAM to enable aero. I haven't been able to find any overrides as of yet. This is stupid. Someone at the Microsoft marketing department has gotten their nose into this or something, because I KNOW aero runs fine on this machine with 512 RAM, having just run it the day before. W T F.

Minor gripe: The "flip 3d" thing they have is useless. It offers less usefulness than just the standard alt-tab. To add to the blogosphere whining, why didn't they just verbatim rip Expose from the mac?

The Vista Basic Theme (low rent graphics):

You could just download a theme from deviantart or elsewhere for windows XP, and you quite literally wouldn't be able to tell the difference. It's debatable as to if this theme is uglier than the default XP theme, it's certainly not much  better that's for sure.

Performance (superfetch smart memory management and caching)

This is a bit strange, but overall I was suitably impressed. Example: When I'd play Warcraft III on windows XP, quitting would bring a 30s to 2 minute "crunch", while everything got paged around the place. On vista, with identical hardware, it's much more responsive

Think of it like this:
In XP, when you load a large program, it gets loaded into ram, then 100% of your CPU/HDD resources are free for use. When you quit, or do some other memory intensive thing, the system takes a dump for a while to sort all itself out.
In Vista, when you load a large program, it gets loaded, but only 95% of your CPU/HDD resources are free, the other 5% are used by superfetch tinkering away in the background - which means that when you quit (or other memory intensive thing), all the stuff it's done in the background pays off and your system just runs a bit slower instead of just falling over.

Programs which you never run take about the same time to load, and basically everything performs the same. Frequently used stuff like firefox loads MUCH more quickly, even though...

Memory usage (oh noes! those horrible background tasks!)

From (my admittedly fuzzy) memory, a vanilla install of XP RTM would use about 90 meg of RAM. Post installing SP2, it would use about 180. After I did the customary beatdown of all the unnecessary services, it'd use about 120 megs.

Vista seemed to use about 300 megs of ram post install. After the services beatdown, it got down to 200 meg (not counting disk cache of course). Aero adds about 50 meg to this. However, the system overall seemed just as responsive as XP did - This is basically a testament to how good superfetch is, but vista still whores teh RAM.  Hopefully it won't be quite so bad when they RTM it, but I'm not holding my breath.

As an aside, this is probably why they don't let you run aero on a machine with 512 RAM - 300 for vista + 50 (at least) for aero doesn't leave much for applications. I can see the reasoning behind it but how about instead of disabling aero on low-ram systems, however there definitely needs to be an "I am not a retard" switch, so that I can use the free ram I got by turning off all the useless background crap to run aero. As I said earlier, I KNOW aero runs fine on this PC. WTF.

Readyboost

I didn't test this on RC1 because I didn't have the memory stick then, but on RC2 I am using an entire 1GB usb2 memory stick for Readyboost. I can't provide any actual numbers or anything, but from my very limited experience, it seems to make a noticeable difference in responsiveness. I'm very happy with it, and I'm usually picky about these kind of things.

PS: Note I said "responsiveness", not "performance". Stuff still runs the same, just those "Crunch" moments like when you load firefox or alt tab out of a game or other "beat the crap out of the pagefile/disk" moments are a whole heap better.

PPS: No, adding 1 gig of readyboost to a 512 MB system still doesn't let you run aero. bastards.

Other neat things

I had mp3's playing, and upgraded my graphics drivers without rebooting or even skipping a beat in the mp3s. It thrashed for a while, the screen went black for a second or 2, and presto new graphics drivers. Seriously impressed.

Being able to type arbitrary commands, like "net stop server" into the search box in the start menu, and into the explorer address bar is awesome

Windows explorer and media player now use the same format for album art as itunes, which is sweet.

The new task manager/performance stuff is great.

And the winner is!

Overall, I'd have to say my favourite part of vista overall thus far would have to be the new windows explorer. I like the new clickable address bar format. The revamped 'documents and settings' thing is so much nicer. The customizable 'favourite links' panel is great. The searching and filtering is brilliant. The new start menu is insanely good.
Oh and not to mention the facts that a) it doesn't hang when you try access network shares, and b) if you're copying/moving/deleting a bunch of files, and one of them fails, it carries on with the rest instead of just falling over like a useless pile of crap like XP and everything before it did. I could go on for hours. I love it. A million points to the shell team at MS.

And the loser is!

This is so cliché I know, but user account control sucks. I agree with the principle in theory, but it's implementation just seems to suck.

On the one hand you have some of the things like when you copy a file to a "restricted" folder - you should get one popup asking for confirmation, but you also before that get a dialog warning you that if you continue you will be prompted for confirmation. They could quite literally rewrite it to the following text: "If you try and do this, we will annoy you with another dialog after this one, are you sure you want us to pop the second dialog so you can click yes and be annoyed"

On the other hand you have things where it just doesn't kick in. If I use explorer to copy and paste a file to a "restricted" folder, it pops me for confirmation and succeeds. If however I drag/drop the file, it just fails with 'access denied' and no prompt or way to get around it. It seems to have no awareness of things other than explorer in some situations too - I can't save files from firefox to some places; doing things from the command prompt pretty much just doesn't work, etc.

IMHO, it looks as if they haven't actually implemented UAC as part of the windows OS or API, they've just made administrators into restricted users, and made explorer dick around with permissions when it launches applications. My recommendation? Turn it off like everyone else, and wait another 5 years, maybe MS will get it right next time.

Conclusion

Vista overall is worth upgrading from XP. I don't know how much I'd pay for it, but it definitely is an overall improvement and there doesn't really seem to be many other downsides apart from the odd piece of software here and there. Almost everything runs just fine, and it's rock solid stable. Just remember to turn off UAC :-)

Monday, September 25, 2006

Function Pointers in C/C++ and boost::bind

In a previous blog entry, I showed the Windows QueueUserAPC function, and how you could use it to get other threads to execute functions. Now this was kind of cool, but if the only functions we can use are free functions which have only a single 32 bit parameter, not so useful. I said I'd explain how to use boost::bind to solve this.

Now, I'm not going to explain how this interacts with QueueUserAPC just yet, because explaining boost functions and bind is well big enough for a blog entry of it's own. Here it is.

Intro - Function Pointers in C and C++

If you're not familiar with what a function pointer even is, well:

  • All your code that you write gets turned into a big bunch of binary stuff by your compiler.
  • In order for this code to run, it has to load this binary stuff into memory
  • Once the binary stuff is in memory, the CPU can be told to execute arbitrary bits of it. This is what C/C++ does behind the scenes when you normally call a function
  • A function pointer in C/C++ is a pointer to a bit of that memory where a function lives.
  • When more code is loaded, or passed around from one part to another, you can get pointers to this new code as well as the static stuff which you wrote upfront.
  • This lets you do things like call functions which didn't exist when you compiled the program, (ie: DLL's), or tell some code to execute a runtime-specified piece of other code on some event (ie: callback functions)

If you're not familiar with function pointers in C, here's an example:

void Print( int a ) {
    std::cout << "Free Function, a = " << a << std::endl; 
}  

void main() {
     void( *x )(int) = &a_function;
      // we now have an object x, which is a pointer to function of type void(*)(int).
     // it happens to be pointing at our Print function
     x( 5 ); //call it 
}


This syntax is fine and dandy for C functions, which don't have classes or anything, so the types are all pretty simple, however in C++, we have classes, which have member functions (or methods if you prefer to call them that). Imagine the following class:

class FooClass {
public:
    void Print( int a ) {
         std::cout << "A FooClass, param = "<< a <<" this = " << this << std::endl;
     }
};

Now, if we want to get a pointer to the Print function, we have to write some extra stuff so the compiler can tell that it's a member function of FooClass, and we also have to pass the instance, so the compiler knows which FooClass the Print function should belong to. We might have 50 of FooClass in an array and it's got to be able to figure out which one is the right one.

FooClass* myFoo = new FooClass(); //create an instance of our FooClass
void( FooClass::* x )(int) = &FooClass::Print 
// we now have an object called x, which is a pointer to function of type void(FooClass::*)(int)
// it happens to be pointing at our FooClass::Print function, but it doesn't know which FooClass instance yet

(myFoo->*x)( 5 );
//call the function, telling it that the FooClass represented by myFoo is the one to use 
//, as if we'd called myFoo->Print( 5 );

As you may well have noticed, using pointers to class members sucks. I've done a fair bit of this kind of thing and I had to go and look up the documentation to remember quite how it was supposed to work. C++ cops a lot of flak for this kind of stuff, and deservedly so in my humble opinion.

We could always make it nicer by using some typedef's, but it's still ugly, still probably wouldn't make sense to most novice programmers, and we still have the problem that the function pointer doesn't stand alone - we also need to pass the object instance around.

If they didn't know any better, but wanted to use this kind of thing, most programmers would probably end up creating some kind of struct which contained the instance, the function pointer, and some other arbitrary data, and passing that around the place. That at least makes it possible, but it still sucks.

boost::function

If you take the following 3 things:

  • Function pointer type declarations suck
  • You can do lots of amazing stuff with templates in C++
  • The guys who write the boost libraries are really really smart

Then, for free functions, we get this:
void Print( int a ) {
    std::cout << "A Free function, param = "<< a << std::endl;
};

void main() {
    void( *oldFunc )(int) = &Print; //C style function pointer
    oldFunc( 5 );

    boost::function<void(int)> newFunc = &Print; //boost function
    newFunc( 5 ); 
} 

It's just my opinion of course, but I think the C style function pointer with the variable name in between the void and the (int) is confusing, whereas the boost function just behaves how you'd expect it to. Code that clearly states what it does, and then simply does it, is the best code.

Even though this is a trivial example and the boost one isn't actually much nicer, I'd still use it just because it's easier to read. However, it still hasn't solved the problem of passing around the instance along with the function for C++ members...

boost::bind

What boost::bind does, is "bind" parameters into a boost::function object (how this actually happens is a bit beyond the scope of this article so I won't go into it )... like an object instance (or a pointer to one). Observe and rejoice

class FooClass {
public:
     void Print( int a ) {
         std::cout << "A FooClass, param = "<< a <<" this = " << this << std::endl;
     }
};

void main() {
    FooClass *myFoo = new FooClass();
    void( FooClass::* oldFunc )(int) = &FooClass::Print; //C style function pointer
    (myFoo->*oldFunc)( 5 );

    boost::function<void(int)> newFunc = boost::bind( &FooClass::Print, myFoo, _1 ); //boost function      
    newFunc( 5 );
}

What is effectively happening, is that myFoo is being "bound" into the newFunc object. Think of it as creating a private variable inside newFunc and sticking myFoo in it. When newFunc is invoked, it will use myFoo as a parameter.

Boost is smart enough to figure out that we're passing in an instance of FooClass instead of just a number or string or whatever, and so when the function is called, it will use that instance, and will do myFoo->Print() automagically for you. This means we don't have to carry the instance round or worry about the awful syntax when we want to use it, we just bind it into the function object, and away we go.

But but but, what's the _1?

Aha! Remember, boost::bind is not "mysterious way to get functions to work", it's "bind a parameter into a function object." It can bind any variable of any type, so long as it matches the boost::function which will hold it. Also, it wants to bind variables for every parameter in the function. This is also perfectly valid code:

boost::function<void()> newFunc = boost::bind( &FooClass::Print, instance, 6 );
newFunc(); //this will call Print with 6 as the 'a' parameter.

What the heck, I hear you say? We're calling the Print function, which takes one int as a parameter, but we're not passing any int's to it. This is just boost::bind doing it's thing. Remember, it wants to bind every parameter it can.

...But if you bind every parameter, you can't supply them later!

This is what the _1 is for. It means "The first parameter, which will be supplied later". In our first example, we used _1 to indicate that we wouldn't bind the parameter to newFunc in straight away, and that we would supply it later on when we invoke the function.

The cool thing about this is, you can mix and match them, and do all kinds of silly stuff, like this:

int MessageBox( HWND, char*, char*, int ); 
//Probably looks familiar to windows coders

boost::function<int( int, char*, char*, HWND )> reversed_params =
      boost::bind( &MessageBox, _4, _3, _2, _1 ); 
      //use the _ markers to move the parameters around

boost::function<int( char*, char* )> bind_some =
      boost::bind( &MessageBox, m_hWnd, _1, _2, 0 ); 
      // bind in some parameters, but leave others to be supplied later,
      // creating a function where the user must supply 2 params instead of 4 

If you're a functional programming fanatic, you'll have been whinging about closures for years, and how langauges that don't have them suck. boost::bind isn't a closure, but if you know what you're doing you can get pretty close to acheiving the same functionality as one, which is pretty cool for C++

Anyway. I've spent ages writing this so hopefully someone will read it. Good Luck!

Sunday, September 17, 2006

Windows live writer is catastrophically broken

As with other microsoft editors, live writer decides that it will kindly reformat your HTML for you. By 'reformat' I mean "remove all the line breaks"

Now, if you're just having a rant, like I am here, you really don't care. So long as the HTML isn't full of <p><p></p></p> like so many other WSIWYG editor outputs, or has (god forbid) css classes of -mso-x-y-random-otherthing everywhere, I don't mind

Except though, when you're trying to write a post which contains code, and that code is inside a white-space:pre; element, like I am. Then, you care a lot about programs that delete all your newlines and turn your 6 line clearly-written code example into a gibbering mess.

It turns out also, that if you tell live writer to download your existing blog posts from blogger, so you can spruce them up a bit with some WSIWYG loving, it removes all your newlines too, so if you dare republish them, they'll be a gibbering mess too.

So, I ask you, windows live writer team. What the hell kind of good is the HTML editing mode if the app is just going to reformat and screw over any HTML you care to write? Why not just change the menu item to spawn a dialog which says "ALL YOUR HTML ARE BELONG TO US." and then quit the app? That at least would have saved me a bunch of time.

Windows Live Writer part 2

Well, it's actually _really_ good. Yesterday blogger was throwing error 500's like crazy so I guess that stopped it.

Anyway, I'll be using live writer for blogging from now on I think, it sure beats publishing via the website textboxes, that's for sure.

More stuff about C++ threading and other stuff coming soon...

Friday, September 15, 2006

This is a test of windows live writer beta

Apparently it's quite good, but well... if it works, perhaps.

Wednesday, September 13, 2006

Your friendly QueueUserAPC

I'm a reasonably active reader of programming.reddit.com and lately there has been a whole ton of articles about concurrent programming, so here's my 2c to throw into the ring. Basically, there seem to be 2 approaches to concurrent programming these days - the classic shared memory/locking model, which the majority of applications these days use, and the 'erlang style' nothing shared/message passing model. They both have their uses - I guess if you were using erlang you'd do that, if you were using C# you would use shared/locking. My job is to program mostly C++ apps on windows, so I'd like to share what I think is a neat trick you can use ( when programming C++ apps on windows ). This may be commonly known by everyone, but I've not seen it mentioned on reddit or anywhere else, so I'm guessing that it's not. In windows NT4 and up, native windows threads have what is called an APC Queue. APC stands for Asynchronous Procedure Call. This is used if you are using any of the Overlapped IO functions ( ReadFileEx and WriteFileEx amongst others ). Basically what it is, is a list of function pointer/ULONG_PTR pairs. So, you ask, what good might that be? Well, this APC Queue gets processed whenever the thread enters what is called an 'Alertable Wait State'. "Alertable Wait State" is a fancy name for "Has called the SleepEx function, WaitForSingleObjectEx function, or one of the many other Wait*Ex functions". So, wherever you'd do a WaitForSomething, you do a WaitForSomethingEx, and windows automagically pulls things off the APC queue and executes them. The QueueUserAPC function allows you to insert your own functions into this Queue. In a nutshell, it says "Execute this callback function in this thread" If you haven't clicked onto why that's so cool, bear with me... Observe the following program. If you're a windows C/C++ coder, it probably looks somewhat familiar #include "windows.h" #include <list> std::list<int> g_listOfInts; DWORD WINAPI ThreadProc( LPVOID param ) { g_listOfInts.push_back( 7 ); } void AddToList( int param ) { g_listOfInts.push_back( param ); } void PrintList() { std::list<int>::iterator iter; for( iter = g_listOfInts.begin(); iter != g_listOfInts.end(); ++iter ) std::cout << *iter << std::endl; } void main() { DWORD dwThreadId; HANDLE hSecondThread = CreateThread( NULL, 0, ThreadProc, 0, &dwThreadId ); AddToList( 5 ); //... do some other stuff PrintList(); } Now, this code of course has a big nasty race condition. As both threads insert into the list, they could both do it at the same time, which would either cause the list to be invalid, the program to crash, or memory corruption, or any number of other bad things. Also, the second thread could modify the list while the iterator is looping over it, which is also problematic. The classic solution is to lock the list. You could use a windows CRITICAL_SECTION, a boost::mutex::scoped_lock, or dozens of other things, which all boil down to "If any other thread wants to look at this object, it must wait for any other threads which might also be lookin at or modifying it. Also, if everything that needs to access the object must wait for everything else, we effectively serialise access to that variable down to 1 thread - if we've got 32 CPU's running 32 threads, 31 of them are going to be waiting on our lock, so we have zero performance improvement over just running a single thread. The 'non-shared' solution would be to somehow enforce that one thread "owns" the list. If any other threads want to get any data from it, they must send a message to the "owner" thread, and it must reply. This is probably trivial in something like erlang, but I don't know erlang, so I can't comment. In Windows/C++, you've got trusty old Windows Messages ( using MSG and PEEKMESSAGE, etc, like you'd have in any windows GUI app ). However, if you were to use this approach for any nontrivial program you'd end up creating hundreds of GET_X and GET_Y messages, and it would soon become unmanagable. QueueUserAPC to the rescue! Look at the next program. #include "windows.h" #include <list> std::list<int> g_listOfInts; HANDLE g_terminateSignal; DWORD WINAPI ThreadProc( LPVOID param ) { g_listOfInts.push_back( 7 ); while( WaitForSingleObjectEx( g_terminateSignal, INFINITE, TRUE ) == WAIT_IO_COMPLETION ); //apc's can execute in this loop while we wait for the quit signal. } void CALLBACK ApcAddToList( ULONG_PTR param ) { g_listOfInts.push_back( (int)param ); } void CALLBACK ApcPrintList( ULONG_PTR param ) { std::list<int>::iterator iter; for( iter = g_listOfInts.begin(); iter != g_listOfInts.end(); ++iter ) std::cout << *iter << std::endl; } void main() { g_terminateSignal = CreateEvent( NULL, TRUE, FALSE, NULL ); DWORD dwThreadId; HANDLE hSecondThread = CreateThread( NULL, 0, ThreadProc, 0, &dwThreadId ); QueueUserAPC( ApcAddToList, hSecondThread, 5 ); QueueUserAPC( ApcPrintList, hSecondThread, NULL ); //magically assume we've written some code to wait for a WM_QUIT //and set g_terminateSignal when we get it. } So, what's the difference here (apart from the code looking all werid and different)? Well, we create our second thread, which adds something to the list, like it did last time, but instead of exiting, it does a WaitEx for the terminate signal to be set. It's now in the "Alertable Wait State". Also, our main function doesn't mess with the list any more. The program is written so the second thread "owns" the list. If the main thread wants to modify it, he must make it happen in the second thread. In this example, first ApcAddToList and then ApcPrintList are "Queued" to the thread (which is in it's alertable wait state), where they are executed. Because everything involving the list only ever happens in thread 2, we no longer have our race condition. We don't have to lock at all either, instead of the threads waiting for each other so they can access the locked memory, they are free to carry on doing other stuff while thread 2 does whatever it needs to. Just like as if we'd written it in one of those fancy concurrent no-shared-state languages, but without having to rewrite your entire codebase. Cool, no? PS: If you're thinking "That's cool, but how can it be useful given that the APC callback has to be a free C-style function and only has one 32 bit parameter...", the answer will come soon. PPS: for those of you that can't wait for me to explain how it can be more useful, go and look at boost::bind.