Sunday, October 15, 2006

Diffing itunes music libraries with ironpython

Intro:

Ages ago when I first played with python, I found it was pretty cool, in a weird sort of way, and that I could probably love it once I had spent 6 months bending my brain around it's quirky bits (too many underscores, whitespace importance, inconsistent and strange libraries)... And then soon after, I found ruby, which had none of the above problems, so I didn't bother learning any more python...

Until a few weeks ago, when IronPython was released.

For the uninitiated, IronPython is python which runs on and inside the .NET CLR (or mono, just not as quickly). My biggest blocker for regular python was the built in libraries/docs. I'm bound to be flamed, but the ones I looked at (file access, networking, HTTP, etc) just were not intuitive. IronPython however is a dream, because it uses all the .NET libraries instead (or as well as, if you like). I'm pretty familiar with the .NET BCL, so this was great.

Actual content :-)

During my ~4 years at my current job, I have built up a large collection of music which I'd listen to. I also have a large collection of music at home. As I'm leaving in 3 weeks, and will lose all that data, I wanted to take a copy of the music from my work computer home.

The problem with this is that I only have a 6 gig ipod mini to transfer the songs on, so I can't just copy them all. I needed to diff the 2 music libraries, and only copy the songs that I don't have at home already. iTunes exports a large hairy pile of crap XML file when you ask it to export it's library, so here I thought would be an opportunity to play about with some IronPython, and post it on the net in case it's useful to anyone else.

Here it is, the comments are the documentation :-). Hopefully it's useful, if only as a quick demo of how things work in ironpython.

# Import all the libraries we'll need import clr clr.AddReferenceByPartialName( "System.Xml" ) from System import * from System.IO import * from System.Xml import * # Create a helper function to convert the iTunes XML file into a hash so it's actually useful # An example of one of the hash entries: ret[ 'Disturbed: Prayer' ] = 'file://C:/path/disturbed_prayer.mp3' def fileToHash( fileName ): ret = {} doc = XmlDocument() doc.Load( fileName ) # The XPath is ugly... Export your itunes library and take a look to see why for elem in doc.DocumentElement.SelectNodes( "/plist/dict/dict/dict" ): # song name is always the first <string> song = elem.SelectSingleNode( "string[1]/text()").Value # artist is always the second <string> artist = elem.SelectSingleNode( "string[2]/text()").Value # Unfortunately the filename is not fixed in the structure, so we have to # find <key>Location</key> and then move to the next element after it path = elem.SelectSingleNode("key[text() = 'Location']").NextSibling.FirstChild.Value # Add it to our return-param ret[ "%s: %s" % ( artist, song ) ] = path return ret # Parse both files into seperate hashes homeSongs = fileToHash( "itunes library home.xml" ) workSongs = fileToHash( "itunes library work.xml" ) # Create a new dict containing only songs that are at work but NOT at home diffSongs = dict( [ (song,workSongs[song]) for song in workSongs if not homeSongs.ContainsKey(song) ] ) # Write them all to the output file # The format of the output file is: file://C:/path/disturbed_prayer.mp3 # Disturbed: Prayer # The reason I've done it like this with the path first is hopefully to make it easier # for another script to be able to use it as a list of filenames to copy... writer = StreamWriter( "diffSongs.txt" ) for str in [ "%s # %s" % (diffSongs[song], song) for song in diffSongs ]: writer.WriteLine( str )

Disclaimer:
1) This was meant to be quick to write, I didn't care about run performance or any other nifty tricks - it only takes 3 seconds to parse 2x 3.5 meg XML files, and that's good enough for me.
2) Sorry about no syntax hilighting, I couldn't find any decent way to do it short of screenshotting my PSPad window.

No comments: