++blog: October 2006

Tuesday, October 17, 2006

Programming best practices

I was going to call this "X secrets of highly effective developers", like some other people, only these things shouldn't be secrets. Note this is as always my not-so-humble opinion, so it is entirely likely that this article is either a) misguided, b) missing things, or c) entirely wrong, but I can't give you anyone else's opinion now can I.
These are all typical cliché's, but I'd like to try explain them just as a brain dump anyway.

1. Code for people, not computers.

This is really the absolute number one goal. Everything else can be taken as a corollary to this.

If you don't code for people, you are writing un-maintainable code. However, it's easy to throw the term around like a lot of other slogans/buzzwords without actually having a solid understanding. What this means to me is in fact "Try and write your programs as if they were plain english"

Well, you know, not quite english, because english has it's own giant set of problems too, but the point I'm trying to make is that someone else should be able to read your code and it should flow as if it has topics, headings, sentences, paragraphs, and so on. You should read it like a book, not decipher it like a code. In fact, code is a crap word, we should call it something like "instructions", instead of code.

How do we do this? With years of experience, and a constant drive to learn and apply new things.
But for now, here's a couple of pointers which I find have helped me so much that I feel like slapping other developers who don't do them:

2. Use good names

This is obviously a subset of #1, but if I had to pick the most important thing, this would be it. Again, "Use good names" is a meaningless phrase, what you should do is "call things what they are". Whenever you have a variable/function/class/whatever, ask yourself "what actually is it? a buffer? a file handle? a person? what?", and then call the variable that. Simple, but mostly always overlooked by most programmers. Other developers should almost always be able to look at a variable/function/class and make a correct guess as to what it is, and what it might do otherwise you probably have a bad name.

This is doubly useful because sometimes you will have trouble expressing what a something actually is. Sometimes, this is just not being able to think of the right word (go learn english :-D ), but more often than not, it is a strong signal that your design isn't correct, or you have accidentally gone down the twisty path towards a tangled mess of garbage.
For example, if you can't think of a single good name for a class or function, it's probably because it's doing more than one job, and should be broken up into 2 classes/functions.

At the end of the day, if you can't even think of a reasonable name for a thing which makes your program work, then how will anyone else (usually you, 6 months later) ever hope to understand it?

3. Use abstraction

This can be "bottom up" programming, or "top down" or whatever design methodology is in fashion at the time, but the important thing is that you build code On top of other code, not alongside it. You are creating a pyramid, not tiling a floor.
That may not have made too much sense - to try explain it a bit better, think of writing a program that opens a file, writes some string to it, and closes the file.

If you were to use the all-too-common floor tiling method, you'd have a big long function (or lots of small functions running in sequence, whatever), which would do the following in sequence:
Allocate a file handle -> call the API open function -> allocate the memory for the string -> keep track of how many bytes we have -> write the memory to the file using the API file writing function -> close the file -> free the string memory.
All the operations are on the same "level", the stuff is just happening in a big row, like laying down tiles next to each other..

If we are to write the above using abstractions, we'd instead have a file class, and a string class. The file class would deal with the file API (handles and stuff), and the string class would deal with the string memory. Then, instead of our program allocating handles and memory, it can just deal with files and string classes. It would do the following.
Create a file object -> Create a string object -> call file.write( string ) -> cleanup done automatically by objects.

When you write programs correctly with abstraction, you can stack the abstractions on top of each other, leading eventually to code that is almost like pseudocode or a domain-specific-language.
This is what object oriented programming, and a lot of other techniques are actually for (as opposed to the common retarded view of inexperienced programmers, that OO isn't being done correctly unless all your classes use inheritance somehow)

4. Do the simplest thing that can possibly work

Now before I get branded as an agile zealot, not everything from agile is actually bad. What this actually means is Do the right thing, but do it the simplest and smallest way you can. Don't write code which doesn't directly help you get things done, or tries to solve problems you don't actually have.
This also doesn't just apply to your higher-level design, but low level too. Functions/classes/interfaces/etc should all be as simple as possible, and do only what they need to.

A classic example of doing the wrong thing here is building a big pile of classes and interfaces and message-handling code before you actually attack your main problem. Yes it's fun to build frameworks and architecture, but at the end of the day, it'll probably just get in the way.

5. Don't repeat yourself, refactor instead.

If you are like lots of developers I've seen, and believe the best way to start a new program/library/class is to find a similar one and copy/paste it into a new file, STOP NOW. BAD PROGRAMMER! SIT!

If you find yourself writing the exact same code twice, refactor it into a common function or class.

If you find yourself writing similar code twice, refactor the common bits into another function or class (generics, dynamic types or other ways of dodging verbose typing, and first-class functions are a big win here), and have the remaining different bits as small and clear as possible.

5. Good design does not come from 'design,' but from refactoring.

If you do all the other stuff, you'll probably find yourself left with a ton of small functions and classes, and all your other classes will be using them. This is already better than lots of giant functions which duplicate code and functionality, but can get a bit messy provided you don't clean them up.
Most likely however, a bunch of your helper functions will all take similar parameters. These are prime candidates for making a new class. Remember also, not everything should to be a class. It's fine to have a bunch of global functions in a namespace (or a static class if you're stuck in C# or java), if that's the nicest way to think about those kinds of objects. The main goal is to always try and make sure that your helper/library functions are as simple, clean, and useful as possible, and are arranged in clear and obvious groups. You can even write unit tests for them :-)

Sooner or later, you'll probably end up with either some kind of framework (to my knowledge, this is how Rails came about), or a set of general classes, like the .NET base class library, but for dealing with the kinds of problems your company faces, or perhaps both.
This is excellent, as you'll be able to re-use these things over and over again in future, meaning next time you have to write that stupid app which has to read the registry and write to files, you'll be able to use your framework/library code and do it in 5 minutes instead of a day. Your boss will love you, and you won't mind programming in C++ anymore because you can actually get things done now.

Also, because you'll have created this framework/library code based on other working code, and based on what you actually need to do, and refactored it to best fit your problems as you go, you'll have stuff which is actually useful and good.
People are stupid, and 99% of us can't design our way out of a paper bag. Most frameworks that get 'designed' up front wind up completely missing the point and to solving the wrong problems in the wrong way. But, by keeping existing code simple, clean, non-repeating, and constantly refactoring it, we can end up with some well structured and maintainable code anyway. Owzat? :-)

Sunday, October 15, 2006

Diffing itunes music libraries with ironpython

Intro:

Ages ago when I first played with python, I found it was pretty cool, in a weird sort of way, and that I could probably love it once I had spent 6 months bending my brain around it's quirky bits (too many underscores, whitespace importance, inconsistent and strange libraries)... And then soon after, I found ruby, which had none of the above problems, so I didn't bother learning any more python...

Until a few weeks ago, when IronPython was released.

For the uninitiated, IronPython is python which runs on and inside the .NET CLR (or mono, just not as quickly). My biggest blocker for regular python was the built in libraries/docs. I'm bound to be flamed, but the ones I looked at (file access, networking, HTTP, etc) just were not intuitive. IronPython however is a dream, because it uses all the .NET libraries instead (or as well as, if you like). I'm pretty familiar with the .NET BCL, so this was great.

Actual content :-)

During my ~4 years at my current job, I have built up a large collection of music which I'd listen to. I also have a large collection of music at home. As I'm leaving in 3 weeks, and will lose all that data, I wanted to take a copy of the music from my work computer home.

The problem with this is that I only have a 6 gig ipod mini to transfer the songs on, so I can't just copy them all. I needed to diff the 2 music libraries, and only copy the songs that I don't have at home already. iTunes exports a large hairy pile of crap XML file when you ask it to export it's library, so here I thought would be an opportunity to play about with some IronPython, and post it on the net in case it's useful to anyone else.

Here it is, the comments are the documentation :-). Hopefully it's useful, if only as a quick demo of how things work in ironpython.

# Import all the libraries we'll need import clr clr.AddReferenceByPartialName( "System.Xml" ) from System import * from System.IO import * from System.Xml import * # Create a helper function to convert the iTunes XML file into a hash so it's actually useful # An example of one of the hash entries: ret[ 'Disturbed: Prayer' ] = 'file://C:/path/disturbed_prayer.mp3' def fileToHash( fileName ): ret = {} doc = XmlDocument() doc.Load( fileName ) # The XPath is ugly... Export your itunes library and take a look to see why for elem in doc.DocumentElement.SelectNodes( "/plist/dict/dict/dict" ): # song name is always the first <string> song = elem.SelectSingleNode( "string[1]/text()").Value # artist is always the second <string> artist = elem.SelectSingleNode( "string[2]/text()").Value # Unfortunately the filename is not fixed in the structure, so we have to # find <key>Location</key> and then move to the next element after it path = elem.SelectSingleNode("key[text() = 'Location']").NextSibling.FirstChild.Value # Add it to our return-param ret[ "%s: %s" % ( artist, song ) ] = path return ret # Parse both files into seperate hashes homeSongs = fileToHash( "itunes library home.xml" ) workSongs = fileToHash( "itunes library work.xml" ) # Create a new dict containing only songs that are at work but NOT at home diffSongs = dict( [ (song,workSongs[song]) for song in workSongs if not homeSongs.ContainsKey(song) ] ) # Write them all to the output file # The format of the output file is: file://C:/path/disturbed_prayer.mp3 # Disturbed: Prayer # The reason I've done it like this with the path first is hopefully to make it easier # for another script to be able to use it as a list of filenames to copy... writer = StreamWriter( "diffSongs.txt" ) for str in [ "%s # %s" % (diffSongs[song], song) for song in diffSongs ]: writer.WriteLine( str )

Disclaimer:
1) This was meant to be quick to write, I didn't care about run performance or any other nifty tricks - it only takes 3 seconds to parse 2x 3.5 meg XML files, and that's good enough for me.
2) Sorry about no syntax hilighting, I couldn't find any decent way to do it short of screenshotting my PSPad window.

Wednesday, October 11, 2006

Vista RC1 and RC2: Impressions

Over the last couple of days I've installed both vista RC1 and RC2 at home, and today installed vista RC2 at work. Here's a quick brain dump of my thoughts about various things that have cropped up: Except for where I explicitly say so below, RC1 and RC2 are pretty much the same.

Aero Glass (fancy graphics):

My home machine is an Athlon 64 3200+, with 512 RAM, and a Radeon 9700 graphics card w/128Mb. The graphics card is the main thing that gets hit by aero, and in RC1, aero wasn't enabled by default after the install. A quick look at the control panel to turn it on, and it was fine. I really liked it. Sure, it's just eye-candy, but who said computers have to be ugly? All in all I was very happy....

HOWEVER: RC2 decides that I need 1 gig of RAM to enable aero. I haven't been able to find any overrides as of yet. This is stupid. Someone at the Microsoft marketing department has gotten their nose into this or something, because I KNOW aero runs fine on this machine with 512 RAM, having just run it the day before. W T F.

Minor gripe: The "flip 3d" thing they have is useless. It offers less usefulness than just the standard alt-tab. To add to the blogosphere whining, why didn't they just verbatim rip Expose from the mac?

The Vista Basic Theme (low rent graphics):

You could just download a theme from deviantart or elsewhere for windows XP, and you quite literally wouldn't be able to tell the difference. It's debatable as to if this theme is uglier than the default XP theme, it's certainly not much better that's for sure.

Performance (superfetch smart memory management and caching)

This is a bit strange, but overall I was suitably impressed. Example: When I'd play Warcraft III on windows XP, quitting would bring a 30s to 2 minute "crunch", while everything got paged around the place. On vista, with identical hardware, it's much more responsive

Think of it like this:
In XP, when you load a large program, it gets loaded into ram, then 100% of your CPU/HDD resources are free for use. When you quit, or do some other memory intensive thing, the system takes a dump for a while to sort all itself out.
In Vista, when you load a large program, it gets loaded, but only 95% of your CPU/HDD resources are free, the other 5% are used by superfetch tinkering away in the background - which means that when you quit (or other memory intensive thing), all the stuff it's done in the background pays off and your system just runs a bit slower instead of just falling over.

Programs which you never run take about the same time to load, and basically everything performs the same. Frequently used stuff like firefox loads MUCH more quickly, even though...

Memory usage (oh noes! those horrible background tasks!)

From (my admittedly fuzzy) memory, a vanilla install of XP RTM would use about 90 meg of RAM. Post installing SP2, it would use about 180. After I did the customary beatdown of all the unnecessary services, it'd use about 120 megs.

Vista seemed to use about 300 megs of ram post install. After the services beatdown, it got down to 200 meg (not counting disk cache of course). Aero adds about 50 meg to this. However, the system overall seemed just as responsive as XP did - This is basically a testament to how good superfetch is, but vista still whores teh RAM. Hopefully it won't be quite so bad when they RTM it, but I'm not holding my breath.

As an aside, this is probably why they don't let you run aero on a machine with 512 RAM - 300 for vista + 50 (at least) for aero doesn't leave much for applications. I can see the reasoning behind it but how about instead of disabling aero on low-ram systems, however there definitely needs to be an "I am not a retard" switch, so that I can use the free ram I got by turning off all the useless background crap to run aero. As I said earlier, I KNOW aero runs fine on this PC. WTF.

Readyboost

I didn't test this on RC1 because I didn't have the memory stick then, but on RC2 I am using an entire 1GB usb2 memory stick for Readyboost. I can't provide any actual numbers or anything, but from my very limited experience, it seems to make a noticeable difference in responsiveness. I'm very happy with it, and I'm usually picky about these kind of things.

PS: Note I said "responsiveness", not "performance". Stuff still runs the same, just those "Crunch" moments like when you load firefox or alt tab out of a game or other "beat the crap out of the pagefile/disk" moments are a whole heap better.

PPS: No, adding 1 gig of readyboost to a 512 MB system still doesn't let you run aero. bastards.

Other neat things

I had mp3's playing, and upgraded my graphics drivers without rebooting or even skipping a beat in the mp3s. It thrashed for a while, the screen went black for a second or 2, and presto new graphics drivers. Seriously impressed.

Being able to type arbitrary commands, like "net stop server" into the search box in the start menu, and into the explorer address bar is awesome

Windows explorer and media player now use the same format for album art as itunes, which is sweet.

The new task manager/performance stuff is great.

And the winner is!

Overall, I'd have to say my favourite part of vista overall thus far would have to be the new windows explorer. I like the new clickable address bar format. The revamped 'documents and settings' thing is so much nicer. The customizable 'favourite links' panel is great. The searching and filtering is brilliant. The new start menu is insanely good.
Oh and not to mention the facts that a) it doesn't hang when you try access network shares, and b) if you're copying/moving/deleting a bunch of files, and one of them fails, it carries on with the rest instead of just falling over like a useless pile of crap like XP and everything before it did. I could go on for hours. I love it. A million points to the shell team at MS.

And the loser is!

This is so cliché I know, but user account control sucks. I agree with the principle in theory, but it's implementation just seems to suck.

On the one hand you have some of the things like when you copy a file to a "restricted" folder - you should get one popup asking for confirmation, but you also before that get a dialog warning you that if you continue you will be prompted for confirmation. They could quite literally rewrite it to the following text: "If you try and do this, we will annoy you with another dialog after this one, are you sure you want us to pop the second dialog so you can click yes and be annoyed"

On the other hand you have things where it just doesn't kick in. If I use explorer to copy and paste a file to a "restricted" folder, it pops me for confirmation and succeeds. If however I drag/drop the file, it just fails with 'access denied' and no prompt or way to get around it. It seems to have no awareness of things other than explorer in some situations too - I can't save files from firefox to some places; doing things from the command prompt pretty much just doesn't work, etc.

IMHO, it looks as if they haven't actually implemented UAC as part of the windows OS or API, they've just made administrators into restricted users, and made explorer dick around with permissions when it launches applications. My recommendation? Turn it off like everyone else, and wait another 5 years, maybe MS will get it right next time.

Conclusion

Vista overall is worth upgrading from XP. I don't know how much I'd pay for it, but it definitely is an overall improvement and there doesn't really seem to be many other downsides apart from the odd piece of software here and there. Almost everything runs just fine, and it's rock solid stable. Just remember to turn off UAC :-)