Quantified Awesome: How much music do you have?

Posted: - Modified: | geek, quantified

I don’t listen to music a lot. Words interfere with my programming or writing (hmm, I should test to see how big the effect is), and I got used to working in silence or with white noise. Some people have a lot of music, though. R. Galacho wrote this Python script that uses the ID3 information in MP3s to sum up listening time in each genre, and wanted me to share it in case anyone else might find it useful:

# -*- coding: utf-8 -*-

muasure v.0.1

How long could you listen...  Many times I've talked with friends about
my digital record collection's size (mmm... we are talking in the
order of GB) and how long could I've been listen if I make a playlist
with the hole collection and play it completely.  Well, I made some
mind calculations setting up average time and making a proportion to
the number of files taken in my HDD.

So, spare time and the speed and versatility inherent to Python give me
the rest.

Music Measure (Muasure for short) calculates the total time of your
music collection. Finally show data in screen and writes a text file
with that information into collection base directory (so if you clean
your screen you don't have to relaunch the process).

The only parameter expected is the base location of your record
collection (by default is the current directory when invoked).

Written, tested, runned and commited on GNU Emacs 23.2.1 (i686-pc-linux-gnu, GTK+ Version 2.24.4) ;)

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or (at
your option) any later version.

This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.
__author__="R. Galacho"

import os, sys, re
from datetime import timedelta
from mutagen.mp3 import MP3, HeaderNotFoundError
from collections import OrderedDict
from cStringIO import StringIO

def get_total_time(directory, collection):
    content_types = dict()
    lengths = 0
    for mp3_file in collection:
            id3v = MP3(mp3_file)
            lengths += id3v.info.length
            content = str(id3v.get('TCON'))
            if content_types.get(content) == None:
                content_types[content] = id3v.info.length
                content_types[content] += id3v.info.length
        except HeaderNotFoundError:
            sys.stderr.write("Error reading file %s\n" % mp3_file)

    total_time = timedelta(seconds = lengths)
    avg_length = timedelta(seconds = float(lengths / len(collection)))
    file_str = StringIO()

    file_str.writelines(["Total time  : ", str(total_time), "\nAverage time: ", str(avg_length), "\n\n"])
    ord_content_types = OrderedDict(sorted(content_types.items(), key=lambda t: t[1], reverse=True))
    for (k, v) in ord_content_types.items():
        total_time = timedelta(seconds = content_types.get(k))
        file_str.writelines([k.ljust(15), ": ", str(total_time),"\n"])

    print file_str.getvalue()
    if os.access(directory, os.W_OK):
        result_file = open(('%s%smuasure-data.txt' % (directory, os.sep)), 'w')


def main(collection_dir):
    directory = os.path.expanduser(collection_dir)

    if not os.access(directory, os.R_OK):
        raise Exception("Not enough permission on %s" % directory)

    collection = []
    pattern = re.compile(r'\.mp3')

    for dir, subdirs, files in os.walk(directory):
        collection.extend("%s%s%s" % (dir, os.sep, f) for f in filter(lambda x: pattern.search(x), files))

    collection = map(os.path.abspath, collection)
    get_total_time(directory, collection)

if __name__ == "__main__":
    if len(sys.argv) > 1:

This requires Python 2.6 or later and Python-mutagen 1.19 or later.

What else can you automatically extract from the files or data you already have? People have done interesting analyses based on geocoded photos, times of tweets, and so on. Have fun exploring!

You can comment with Disqus or you can e-mail me at sacha@sachachua.com.