But I've just stored these in itunes. Not only does that mean they're locked within the Apple Empire, they're also vulnerable to me losing my hard drive. So I wanted to get them into real text files. Luckily, itunes lets you export playlists. Unluckily, it's in some bizarre janky format, when I really just want to extract the artist, title, and album for each song. Simple python script to the rescue.
Ah, but even after deleting some of the crud, I was left with a file in a mash of file formats! See, I had pulled out artist, title, and album, then concatenated them with commas, then written that to a file. But I hadn't paid attention to encodings, so I had some UTF-16 characters, then some UTF-8 commas, then more UTF-16 characters. But Python has an easy answer: just read in the one file as UTF-16, specify that your output file is UTF-8, and within your script deal with strings and don't worry about encodings.
Tim Bray explains UTF-8, UTF-16, and UTF-32 clearly; this is something I probably should have thoroughly understood a while ago.
Evan Jones has a nice overview of how to use unicode in Python.
And here's my script:
#!/usr/bin/env python
import codecs
for filename in open("filenames.txt"): # next time I'll learn
# syntax for "for filename
# in current directory"
filename = filename.strip()
outfilename = "output/" + filename.replace(" ", "_")
outfile = codecs.open(outfilename, "w", "utf-8")
for bigline in codecs.open(filename, "r", "utf-16"):
lines = bigline.split("\r")
for line in lines:
parts = line.split("\t")
if len(parts) < 4:
continue
song = parts[0]
artist = parts[1]
album = parts[3]
linetowrite = "%s, %s, %s\n" % (artist, song, album)
outfile.write(linetowrite)
I'd like to estimate that about 80-90% of programmers doesn't understand Unicode. It's an amazingly smart encoding. I didn't really understand it until this year after reading http://www.joelonsoftware.com/articles/Unicode.html (I don't agree with everything Joel says, but he does write well).
ReplyDelete