Quantified Awesome: How much music do you have?
Posted: - Modified: | geek, quantifiedI don’t listen to music a lot. Words interfere with my programming or writing (hmm, I should test to see how big the effect is), and I got used to working in silence or with white noise. Some people have a lot of music, though. R. Galacho wrote this Python script that uses the ID3 information in MP3s to sum up listening time in each genre, and wanted me to share it in case anyone else might find it useful:
# -*- coding: utf-8 -*- __doc__=""" muasure v.0.1 How long could you listen... Many times I've talked with friends about my digital record collection's size (mmm... we are talking in the order of GB) and how long could I've been listen if I make a playlist with the hole collection and play it completely. Well, I made some mind calculations setting up average time and making a proportion to the number of files taken in my HDD. So, spare time and the speed and versatility inherent to Python give me the rest. Music Measure (Muasure for short) calculates the total time of your music collection. Finally show data in screen and writes a text file with that information into collection base directory (so if you clean your screen you don't have to relaunch the process). The only parameter expected is the base location of your record collection (by default is the current directory when invoked). Written, tested, runned and commited on GNU Emacs 23.2.1 (i686-pc-linux-gnu, GTK+ Version 2.24.4) ;) This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. """ __author__="R. Galacho" __version__="0.0.1" __date__="20111010" import os, sys, re from datetime import timedelta from mutagen.mp3 import MP3, HeaderNotFoundError from collections import OrderedDict from cStringIO import StringIO def get_total_time(directory, collection): content_types = dict() lengths = 0 for mp3_file in collection: try: id3v = MP3(mp3_file) lengths += id3v.info.length content = str(id3v.get('TCON')) if content_types.get(content) == None: content_types[content] = id3v.info.length else: content_types[content] += id3v.info.length except HeaderNotFoundError: sys.stderr.write("Error reading file %s\n" % mp3_file) total_time = timedelta(seconds = lengths) avg_length = timedelta(seconds = float(lengths / len(collection))) file_str = StringIO() file_str.writelines(["Total time : ", str(total_time), "\nAverage time: ", str(avg_length), "\n\n"]) ord_content_types = OrderedDict(sorted(content_types.items(), key=lambda t: t[1], reverse=True)) for (k, v) in ord_content_types.items(): total_time = timedelta(seconds = content_types.get(k)) file_str.writelines([k.ljust(15), ": ", str(total_time),"\n"]) print file_str.getvalue() if os.access(directory, os.W_OK): result_file = open(('%s%smuasure-data.txt' % (directory, os.sep)), 'w') result_file.write(file_str.getvalue()) result_file.flush() result_file.close() file_str.close() def main(collection_dir): directory = os.path.expanduser(collection_dir) if not os.access(directory, os.R_OK): raise Exception("Not enough permission on %s" % directory) collection = [] pattern = re.compile(r'\.mp3') for dir, subdirs, files in os.walk(directory): collection.extend("%s%s%s" % (dir, os.sep, f) for f in filter(lambda x: pattern.search(x), files)) collection = map(os.path.abspath, collection) get_total_time(directory, collection) if __name__ == "__main__": if len(sys.argv) > 1: main(sys.argv[1]) else: main("./")
This requires Python 2.6 or later and Python-mutagen 1.19 or later.
What else can you automatically extract from the files or data you already have? People have done interesting analyses based on geocoded photos, times of tweets, and so on. Have fun exploring!
You can comment with Disqus or you can e-mail me at sacha@sachachua.com.