Tags: library

RSS - Atom - Subscribe via email

Scripting and the Toronto Public Library’s movie collection

| geek

We hardly ever watch movies in the theatre now, since we prefer watching movies with subtitles and the ability to pause. Fortunately, the Toronto Public Library has a frequently updated collection of DVDs. The best time to grab a movie is when it’s a new release, since DVDs that have been in heavy circulation can get pretty scratched up from use. However, newly released movies can’t be reserved. You need to find them at the library branches they’re assigned to, and then you can borrow them for seven days. You can check the status of each movie online to see if it’s in the library or when it’s due to be returned.

Since there are quite a few movies on our watch list, quite a few library branches we can walk to, and some time flexibility as to when to go, checking all those combinations is tedious. I wrote a script that takes a list of branches and a list of movie URLs, checks the status of each, and displays a table sorted by availability and location. My code gives me a list like this:

In Library Annette Street Mad Max Fury Road M 10-8:30 T 12:30-8:30 W 10-6 Th 12:30-8:30 F 10-6 Sat 9-5
In Library Bloor/Gladstone Inside Out M 9-8:30 T 9-8:30 W 9-8:30 Th 9-8:30 F 9-5 Sat 9-5 Sun 1:30-5
In Library Bloor/Gladstone Match M 9-8:30 T 9-8:30 W 9-8:30 Th 9-8:30 F 9-5 Sat 9-5 Sun 1:30-5
In Library Jane/Dundas Avengers: Age of Ultron M 9-8:30 T 9-8:30 W 9-8:30 Th 9-8:30 F 9-5 Sat 9-5
In Library Jane/Dundas Ant-Man M 9-8:30 T 9-8:30 W 9-8:30 Th 9-8:30 F 9-5 Sat 9-5
In Library Jane/Dundas Mad Max Fury Road M 9-8:30 T 9-8:30 W 9-8:30 Th 9-8:30 F 9-5 Sat 9-5
In Library Jane/Dundas Minions M 9-8:30 T 9-8:30 W 9-8:30 Th 9-8:30 F 9-5 Sat 9-5
In Library Perth/Dupont Chappie T 12:30-8:30 W 10-6 Th 12:30-8:30 F 10-6 Sat 9-5
In Library Runnymede Ant-Man M 9-8:30 T 9-8:30 W 9-8:30 Th 9-8:30 F 9-5 Sat 9-5
In Library Runnymede Minions M 9-8:30 T 9-8:30 W 9-8:30 Th 9-8:30 F 9-5 Sat 9-5
In Library St. Clair/Silverthorn Kingsman: the Secret Service T 12:30-8:30 W 10-6 Th 12:30-8:30 F 10-6 Sat 9-5
In Library St. Clair/Silverthorn Mad Max Fury Road T 12:30-8:30 W 10-6 Th 12:30-8:30 F 10-6 Sat 9-5
In Library Swansea Memorial Ant-Man T 10-6 W 1-8 Th 10-6 Sat 10-5
In Library Swansea Memorial Chappie T 10-6 W 1-8 Th 10-6 Sat 10-5
In Library Swansea Memorial Kingsman: the Secret Service T 10-6 W 1-8 Th 10-6 Sat 10-5
In Library Swansea Memorial Kingsman: the Secret Service T 10-6 W 1-8 Th 10-6 Sat 10-5
In Library Swansea Memorial Mad Max Fury Road T 10-6 W 1-8 Th 10-6 Sat 10-5
In Library Swansea Memorial Minions T 10-6 W 1-8 Th 10-6 Sat 10-5
2015-12-08 Perth/Dupont Terminator Genisys T 12:30-8:30 W 10-6 Th 12:30-8:30 F 10-6 Sat 9-5
2015-12-08 Perth/Dupont Mad Max Fury Road T 12:30-8:30 W 10-6 Th 12:30-8:30 F 10-6 Sat 9-5
2015-12-09 Swansea Memorial Avengers: Age of Ultron T 10-6 W 1-8 Th 10-6 Sat 10-5

… many more rows omitted. =)

With this data, I can decide that Swansea Memorial has a bunch of things I might want to check out, and pick that as the destination for my walk. Sure, there’s a chance that someone else might check out the movies before I get there (although I can minimize that by getting to the library as soon as it opens), or that the video has been misfiled or misplaced, but overall, the system tends to work fine.

It’s easy for me to send the output to myself by email, too. I just select the part of the table I care about and use Emacs’ M-x shell-command-on-region (M-|) to mail it to myself with the command mail -s "Videos to check out" sacha@sachachua.com.

The first time I ran my script, I ended up going to Perth/Dupont to pick up seven movies in addition to the two I picked up from Annette Library. Many of the movies had been returned but not yet shelved, so the librarian retrieved them from his bin and gave them to me. When I got back, W- looked at the stack of DVDs by the television and said, “You know that’s around 18 hours of viewing, right?” It’ll be fine for background watching. =)

Little things like this make me glad that I can write scripts and other tiny tools to make my life better. Anything that involves multiple steps or combining information from multiple sources might be simpler with a script. I wrote this script as a command-line tool with NodeJS, since I’m comfortable with the HTML request and parsing libraries available there.

Anyway, here’s the code, in case you want to build on the idea. Have fun!

/* Shows you which videos are available at which libraries.

   Input: A json filename, which should be a hash of the form:
   {"branches": {"Branch name": "Additional branch details (ex: hours)", ...},
   "videos": [{"Title": "URL to library page"}, ...]}.

   Example: {
   "branches": {
   "Runnymede": "M 9-8:30 T 9-8:30 W 9-8:30 Th 9-8:30 F 9-5 Sat 9-5"
   },
   "videos": [
   {"title": "Avengers: Age of Ultron", "url": "http://www.torontopubliclibrary.ca/detail.jsp?Entt=RDM3350205&R=3350205"}
   ]}

   Output:
   Status,Branch,Title,Branch notes
*/

var rp = require('request-promise');
var moment = require('moment');
var async = require('async');
var cheerio = require('cheerio');
var q = require('q');
var csv = require('fast-csv');
var fs = require('fs');

if (process.argv.length < 3) {
  console.log('Please specify the JSON file to read the branches and videos from.');
  process.exit(1);
}

var config = JSON.parse(fs.readFileSync(process.argv[2]));
var branches = config.branches;
var videos = config.videos;

/*
  Returns a promise that will resolve with an array of [status,
  branch, movie, info], where status is either the next due date, "In
  Library", etc. */
function checkStatus(branches, movie) {
  var url = movie.url;
  var matches = url.match(/R=([0-9]+)/);
  return rp.get(
    'http://www.torontopubliclibrary.ca/components/elem_bib-branch-holdings.jspf?print=&numberCopies=1&itemId='
      + matches[1]).then(function(a) {
        var $ = cheerio.load(a);
        var results = [];
        var lastBranch = '';              
        $('tr.notranslate').each(function() {
          var row = $(this);
          var cells = row.find('td');
          var branch = $(cells[0]).text().replace(/^[ \t\r\n]+|[ \t\r\n]+$/g, '');
          var due = $(cells[2]).text().replace(/^[ \t\r\n]+|[ \t\r\n]+$/g, '');
          var status = $(cells[3]).text().replace(/^[ \t\r\n]+|[ \t\r\n]+$/g, '');
          if (branch) { lastBranch = branch; }
          else { branch = lastBranch; }
          if (branches[branch]) {
            if (status == 'On loan' && (matches = due.match(/Due: (.*)/))) {
              status = moment(matches[1], 'DD/MM/YYYY').format('YYYY-MM-DD');
            }
            if (status != 'Not Available - Search in Progress') {
              results.push([status, branch, movie.title, branches[branch]]);
            }
          }
        });
        return results;
      });
}

function checkAllVideos(branches, videos) {
  var results = [];
  var p = q.defer();
  async.eachLimit(videos, 5, function(video, callback) {
    checkStatus(branches, video).then(function(result) {
      results = results.concat(result);
      callback();
    });
  }, function(err) {
    p.resolve(results.sort(function(a, b) {
      if (a[0] == 'In Library') {
        if (b[0] == 'In Library') {
          if (a[1] < b[1]) return -1;
          if (a[1] > b[1]) return 1;
          if (a[2] < b[2]) return -1;
          if (a[2] > b[2]) return 1;
          return 0;
        } else {
          return -1;
        }
      }
      if (b[0] == 'In Library') { return 1; }
      if (a[0] < b[0]) { return -1; }
      if (a[0] > b[0]) { return 1; }
      return 0;
    }));
  });
  return p.promise;
}

checkAllVideos(branches, videos).then(function(result) {
  csv.writeToString(result, {}, function(err, data) {
    console.log(data);
  });
});

P.S. Okay, I’m really tempted to walk over to Swansea Memorial, but W- reminds me that we’ve got a lot of movies already waiting to be watched. So I’ll probably just walk to the supermarket, but I’m looking forward to running this script once we get through our backlog of videos!

Exploring neighbourhood libraries and other notes from the Toronto Public Library Hackathon

Posted: - Modified: | development, geek, kaizen

UPDATE 2015-11-30: Here's the Toronto Public Library's recap, along with other videos.

UPDATE 2015-11-27: Here's the video of my hackathon pitch:

UPDATE 2015-11-18: I figured out how to make this entirely client-side, so you don't have to run a separate server. First, install either Tampermonkey (Chrome) or Greasemonkey (Firefox). Then install the user script insert-visualize-link.user.js , and the Visualize link should appear next to the library branch options on Toronto Public Library search result pages. See the Github repository for more details.

Yay! My neighbourhood library visualization won at the Toronto Public Library hackathon. It added a Visualize link to the search results page which mapped the number of search results by branch. For example, here's a visualization of a search that shows items matching "Avengers comics".

avengers

It's a handy way to see which branches you might want to go to so that you can browse through what's there in person.

Librarians could also use it to help them plan their selections, since it's easy to see the distribution across branches. For example, here's the visualization for books in Tagalog.

tagalog

The collections roughly match up with Wellbeing Toronto's data on Tagalog as the home language, although there are some areas that could probably use collections of their own.

tagalog census

Incidentally, I was delighted to learn that Von Totanes had done a detailed analysis of the library's Filipino collections in the chapter he wrote in Filipinos in Canada: Disturbing Invisibility (Coloma, McElhinny, and Tungohan, 1992). Von sent me the chapter after I mentioned the hackathon on Facebook; yay people bumping into other people online!

Personally, I'm looking forward to using this visualization to see things like which branches have new videos. Videos released in the past year can only be borrowed in person – you can't request them online – so it's good to check branches regularly to see if they're there. It would be even better if the library search engine had a filter for "On the shelf right now", but in the meantime, this visualization tool gives me a good idea of our chances of picking up something new to watch while we're folding laundry. =)

Notes

https://github.com/sachac/explore-neighbourhood-libraries

More notes will probably follow, but here are a few quick drawings:

2015-11-15b Tech - Exploring neighbourhood libraries -- index card #tpl #hackathon

2015-11-15c Kaizen and the Toronto Public Library hackathon -- index card #tpl #hackathon #kaizen #improvement

2015-11-15e Ideas for following up on TPL hackathon -- index card #prototyping #tpl #hackathon

 

The code works by extracting the branch names and totals on the left side of search pages and combining those with the locations of the branches (KML). I don't really need the server component, so I'm thinking of rewriting the script so that it runs entirely client-side – maybe as a Chrome extension or as a user script. That way, other people can play with the idea without running their own server (and without my having to keep a server around), and we can try it out without waiting for the library to integrate it into their website. That said, it would be totally awesome to get it into the interface of the Toronto Public Library! We'll just have to see if it can happen. =) Happy to chat with library geeks to get this sorted out.

It was fun working on this. W- decided to join me at the last minute, so it turned into a fun weekend of hanging out with my husband at the library. I wanted to keep my weekend flexible and low-key, so I decided not to go through the team matchmaking thing. W- found some comfy chairs in the corner of the area, I plugged in the long extension cord I brought, and we settled in.

I learned a lot from the hackathon mentors. In particular, I picked up some excellent search and RSS tips from Alan Harnum. You can't search with a blank query, but he showed me how you can start with a text string, narrow the results using the facets on the left side, and then remove the text string from the query in order to end up with a search that uses only the facets. He also showed me that the RSS feed had extra information that wasn't in the HTML source and that it could be paginated with URL parameters. Most of the RSS feeds I'd explored in the past were nonpaginated subsets of the information presented on the websites, so it was great to learn about the possibilities I had overlooked.

The faceted search was exactly what I needed to list recent videos even if I didn't know what they were called, so I started thinking of fun tools that would make hunting for popular new videos easier. (There have been quite a few times when I've gone to a library at opening time so that I could snag a video that was marked as available the night before!) In addition to checking the specific item's branch details to see where it was on the shelf and which copies were out on loan, I was also curious about whether we were checking the right library, or if other libraries were getting more new videos than our neighbourhood library was.

W- was curious about the Z39.50 protocol that lets you query a library catalogue. I showed him the little bits I'd figured out last week using yaz-client from the yaz package, and he started digging into the protocol reference. He figured out how to get it to output XML (format xml) and how to search by different attributes. I'm looking forward to reading his notes on that.

Me, I figured that there might be something interesting in the visualization of new videos and other items. I hadn't played around a lot with geographic visualization, so it was a good excuse to pick up some skills. First, I needed to get the data into the right shape.

Step 1: Extract the data and test that I was reading it correctly

I usually find it easier to start with the data rather than visualizations. I like writing small data transformation functions and tests, since they don't involve complex external libraries. (If you miss something important when coding a visualization, often nothing happens!)

I wrote a function to extract information from the branch CSV on the hackathon data page, using fast-csv to read it as an array of objects. I tested that with jasmine-node. Tiny, quick accomplishment.

Then I worked on extracting the branch result count from the search results page. This was just a matter of finding the right section, extracting the text, and converting the numbers. I saved a sample results page to my project and used cheerio to parse it. I decided not to hook it up to live search results until I figured out the visualization aspect. No sense in hitting the library website repeatedly or dealing with network delays.

Step 2: Make a simple map that shows library branches

I started with the Google Maps earthquake tutorial. The data I'd extracted had addresses but not coordinates. I tried using the Google geocoder, but with my rapid tests, I ran into rate limits pretty early. Then it occurred to me that with their interest in open data, the library was the sort of place that would probably have a file with branch coordinates in terms of latitude and longitude. The hackathon data page didn't list any obvious matches, but a search for Toronto Public Library KML (an extension I remembered from W-'s explorations with GPS and OpenStreetMap) turned up the file I wanted. I wrote a test to make sure this worked as I expected.

Step 3: Combine the data

At first I tried to combine the data on the client side, making one request for the branch information and another request for the results information. It got a bit confusing, though – I need to get the hang of using require in a from-scratch webpage. I decided the easiest way to try my idea out was to just make the server combine the data and return the GeoJSON that the tutorial showed how to visualize. That way, my client-side HTML and JS could stay simple.

Step 4: Fiddle with the visualization options

Decisions, decisions… Red was too negative. Blue and green were hard to see. W- suggested orange, and that worked out well with Google Maps' colours. Logarithmic scale or linear scale? Based on a maximum? After experimenting with a bunch of options, I decided to go with a linear scale (calculated on the server), since it made sense for the marker for a branch with a thousand items to be significantly bigger than a branch with five hundred items. I played with this a bit until I came up with maximum and minimum sizes that made sense to me.

Step 5: Hook it up to live search data

I needed to pass the URL of the search results, and I knew I wanted to be able to call the visualization from the search results page itself. I used TamperMonkey to inject some Javascript into the Toronto Public Library webpage. The library website didn't use JQuery, so I looked up the plain-vanilla Javascript way of selecting and modifying elements.

document.querySelector('#refinements-library_branch')
  .parentNode.querySelector('h3').innerHTML =
  'Library Branch <a target="_blank" style="color: white; ' +
  'text-decoration: underline" ' +
  'href="http://localhost:9000/viz.html?url=' +
  encodeURIComponent(location.href) + '">(Visualize)</a>';

Step 6: Tweak the interface

I wanted to display information on hover and filter search results on click. Most of the tutorials I saw focused on how to add event listeners to individual markers, but I eventually found an example that showed how to add a listener to map.data and get the information from the event object. I also found out that you could add a title attribute and get a simple tooltip to display, which was great for confirming that I had the data all lined up properly.

Step 7: Cache the results

Testing with live data was a bit inconvenient because of occasional timeouts from the library website, so I decided to cache search results to the filesystem. I didn't bother writing code for checking last modification time, since I knew it was just for demos and testing.

Step 8: Prettify the hover

The tooltip provided by title was a little bare, so I decided to spend some time figuring out how to make better information displays before taking screenshots for the presentation. I found an example that showed how to create and move an InfoWindow based on the event's location instead of relying on marker information, so I used that to show the information with better formatting.

Step 9: Make the presentation

Here's how I usually plan short presentations:

  1. Figure out the key message and the flow.
  2. Pick a target words-per-minute rate and come up with a word budget.
  3. Draft the script, checking it against my word budget.
  4. Read the script out loud a few times, checking for time, tone, and hard-to-say phrases.
  5. Annotate the script with notes on visual aids.
  6. Make visuals, take screenshots, etc.
  7. Record and edit short videos, splitting them up in Camtasia Studio by using markers so that I can control the pace of the video.
  8. Copy the script (or keywords) into the presenter's notes.
  9. Test the script for time and flow, and revise as needed.

I considered two options for the flow. I could start with the personal use case (looking for new videos) and then expand from there, tying it into the library's wider goals. That would be close to how I developed it. Or I could start with one of the hackathon challenges, establish that connection with the library's goals, and then toss in my personal use case as a possibly amusing conclusion. After chatting about it with W- on the subway ride home from the library, I decided to start with the second approach. I figured that would make it easier for people to connect the dots in terms of relevance.

I used ~140wpm as my target, minus a bit of a buffer for demos and other things that could come up, so roughly 350 words for 3 minutes. I ran through the presentation a few times at home, clocking in at about 2:30. I tend to speak more quickly when I'm nervous, so I rehearsed with a slightly slower pace. That way, I could get a sense of what the pace should sound like. During the actual presentation, though, I was a teensy bit over time – there was a bit of unexpected applause. Also, even though I remembered to slow down, I didn't breathe as well as I probalby should've; I still tend to breathe a little shallowly when I'm on stage. Maybe I should pick a lower WPM for presentations and add explicit breathing reminders. =)

I normally try to start with less material and then add details to fit the time. That way, I can easily adjust if I need to compress my talk, since I've added details in terms of priority. I initially had a hard time concisely expressing the problem statement and tying together the three examples I wanted to use, though. It took me a few tries to get things to fit into my word budget and flow in a way that made me happy.

Anyway, once I sorted out the script, I came up with some ideas for the visuals. I didn't want a lot of words on the screen, since it's hard to read and listen at the same time. Doodles work well for me. I sketched a few images and created a simple sequence. I took screenshots for the key parts I wanted to demonstrate, just in case I didn't get around to doing a live demo or recording video. That way, I didn't have to worry about scrambling to finish my presentation. I could start with something simple but presentable, and then I could add more frills if I had time.

Once the static slides were in place, I recorded and edited videos demonstrating the capabilities. Video is a nice way to give people a more real sense of how something works without risking as many technical issues as a live demo would.

I had started with just my regular resolution (1366×768 on my laptop) and a regular browser window, but the resulting video was not as sharp as it could have been. Since the presentation template had 4:3 aspect ratio, I redid the video with 1024×768 resolution and a full-screen browser in order to minimize the need for resizing.

I sped up boring parts of the video and added markers where I wanted to split it into slides. Camtasia Studio rendered the video into separate files based on my markers. I added the videos to individual slides, setting them to play automatically. I like the approach of splitting up videos onto separate slides because it allows me to narrate at my own pace instead of speeding up or slowing down to match the animation.

I copied the segments of my script to the presenter notes for each slide, and I used Presenter View to run through it a few more times so that I could check whether the pace worked and whether the visuals made sense. Seemed all right, yay!

Just in time, too. I had a quick lunch and headed off to the library for the conclusion of the hackathon.

There wsa a bit of time before the presentations started. I talked to Alan again to show him what I'd made, hear about what he had been working on, and pick his brain to figure out which terms might resonate with the internal jargon of the library – little things, like what they call the people who decide what kinds of books should be in which libraries, or what they call the things that libraries lend. (Items? Resources? Items.) Based on his feedback, I edited my script to change "library administrators" to "selection committees". I don't know if it made a difference, but it was a good excuse to learn more about the language people used.

I tested that the presentation displayed fine on the big screen, too. It turned out that the display was capable of widescreen input at a higher resolution than what I'd set, but 1024×768 was pretty safe and didn't look too fuzzy, so I left it as it was. I used my presentation remote to flip through the slides while confirming that things looked okay from the back of the room (colours, size, important information not getting cut off by people's heads, etc.). The hover text was a bit small, but it gave the general idea.

And then it was presentation time. I was third, which was great because once I finished, I could focus on other people's presentations and learn from their ideas. Based on W-'s cellphone video, it looks like I remembered to use the microphone so that the library could record, and I remembered to look up from my presenter notes and gesture from time to time (hard when you're hidden behind the podium, but we do what we can!). I stayed pretty close to my script, but I hope I kept the script conversational enough that it sounded more like me instead of a book. I didn't have the mental bandwidth to keep an eye on the timer in the center of the presenter view, but fortunately the time worked out reasonably well. I concluded just as the organizer was getting up to nudge me along, and I'd managed to get to all the points I wanted to along the way. Whew!

Anyway, that's a quick braindump of the project and what it was like to hack it together. I'll probably write some more about following up on ideas and about other people's presentations, but I wanted to get this post out there while the experience was fresh in my head. It was fun. I hope the Toronto Public Library will take the hackathon ideas forward, and I hope they'll get enough out of the hackathon that they'll organize another one! =)

View or add comments (Disqus), or e-mail me at sacha@sachachua.com

Wow, literate devops with Emacs and Org does actually work on Windows

| emacs, org

Since I persist in using Microsoft Windows as my base system, I’m used to not being able to do the kind of nifty tricks that other people do with Emacs and shell stuff.

So I was delighted to find that the literate devops that Howard Abrams described – running shell scripts embedded in an Org Mode file on a remote server – actually worked with Plink.

Here’s my context: The Toronto Public Library publishes a list of new books on the 15th of every month. I’ve written a small Perl script that parses the list for a specified category and displays the Dewey decimal code, title, and item ID. I also have another script (Ruby on Rails, part of quantifiedawesome.com) that lets me request multiple items by pasting in text containing the item IDs. Tying these two together, I can take the output of the library new releases script, delete the lines I’m not interested in, and feed those lines to my library request script.

Instead of starting Putty, sshing to my server, and typing in the command line myself, I can now use C-c C-c on an Org Mode block like this:

#+begin_src sh :dir /sacha@direct.sachachua.com:~
perl library-new.pl Business
#+end_src

That’s in a task that’s scheduled to repeat monthly, for even more convenience, and I also have a link there to my web-based interface for bulk-requesting files. But really, now that I’ve got it in Emacs, I should add a #+NAME: above the #+RESULTS: and have Org Mode take care of requesting those books itself.

On a related note, I’d given up on being able to easily use TRAMP from Emacs on Windows before, because Cygwin SSH was complaining about a non-interactive terminal.

ssh -l sacha  -o ControlPath=c:/Users/Sacha/AppData/Local/Temp/tramp.13728lpv.%r@%h:%p -o ControlMaster=auto -o ControlPersist=no -e none direct.sachachua.com && exit || exit
Pseudo-terminal will not be allocated because stdin is not a terminal.
ssh_askpass: exec(/usr/sbin/ssh-askpass): No such file or directory
Permission denied, please try again.
ssh_askpass: exec(/usr/sbin/ssh-askpass): No such file or directory
Permission denied, please try again.
ssh_askpass: exec(/usr/sbin/ssh-askpass): No such file or directory
Permission denied (publickey,password).

As it turns out, setting the following made it work for me.

(setq tramp-default-method "plink")

Now I can do things like the following:

(find-file "/sacha@direct.sachachua.com:~/library-new.pl")

… which is, incidentally, this file (edited to remove my credentials):

#!/usr/bin/perl
# Displays new books from the Toronto Public Library
#
# Author: Sacha Chua (sacha@sachachua.com)
#
# Usage:
# perl library-new.pl <category> - print books
# perl library-new.pl <file> <username> <password> - request books
#
use Date::Parse;

#!/usr/bin/perl -w
use strict;
use URI::URL;
use WWW::Mechanize;
use WWW::Mechanize::FormFiller;
use WWW::Mechanize::TreeBuilder;
use HTML::TableExtract;
use Data::Dumper;
sub process_account($$$);
sub trim($);
sub to_renew($$);
sub clean_string($);

# Get the arguments
if ($#ARGV < 0) {
    print "Usage:\n";
    print "perl library-new.pl <category/filename> [username] [password]\n";
    exit;
}

my $agent = WWW::Mechanize->new( autocheck => 1 );
my $formfiller = WWW::Mechanize::FormFiller->new();
if ($#ARGV > 0) {
  my $filename = shift(@ARGV);
  my $username = "NOT ACTUALLY MY USERNAME";
  my $password = "NOT ACTUALLY MY PASSWORD";
  print "Requesting books\n";
  request_books($agent, $username, $password, $filename);
} else {
  my $category = shift(@ARGV);
  WWW::Mechanize::TreeBuilder->meta->apply($agent);
  print_new_books($agent, $category);
}


## FUNCTIONS ###############################################

# Perl trim function to remove whitespace from the start and end of the string
sub trim($)
{
  my $string = shift;
  $string =~ s/^\s+//;
  $string =~ s/\s+$//;
  return $string;
}

sub request_books($$$$)
{
  my $agent = shift;
  my $username = shift;
  my $password = shift;
  my $filename = shift;

  # Read titles and IDs
  open(DATA, $filename) or die("Couldn't open file.");
  my @lines = <DATA>;
  close(DATA);

  my %requests = ();

  my $line;
  my $title;
  my $id;
  foreach $line (@lines) {
    ($title, $id) = split /\t/, $line;
    chomp $id;
    $requests{$id} = $title;
  }

  # Log in
  log_in_to_library($agent, $username, $password);
  print "Retrieving checked-out and requested books...";
  # Retrieve my list of checked-out and requested books
  my $current_ids = get_current_ids($agent);

  # Retrieve a stem URL that I can use for requests
  my $base_url = 'https://www.torontopubliclibrary.ca/placehold?itemId=';
  my @already_out;
  my @success;
  my @failure;
  # For each line in the file
  while (($id, $title) = each(%requests)) {
    # Do I already have it checked out or on hold? Skip.
    if ($current_ids->{$id}) {
      push @already_out, $title . " (" . $current_ids->{$id} . ")";
    } else {
      # Request the hold
      my $url = $base_url . $id;
      $agent->get($url);
      $agent->form_name('form_place-hold');
      $agent->submit();
      if ($agent->content =~ /The hold was successfully placed/) {
        # print "Borrowed ", $title, "\n";
        ## Did it successfully get checked out? Save title in success list
        push @success, $title;
      } else {
        # Else, save title and ID in fail list
        push @failure, $title . "\t" . $id;
      }
    }
  }
  # Print out success list
  if ($#success > 0) {
    print "BOOKS REQUESTED:\n";
    foreach my $title (@success) {
      print $title, "\n";
    }
    print "\n";
  }
  # Print out already-out list
  if ($#already_out > 0) {
    print "ALREADY REQUESTED/CHECKED OUT:\n";
    foreach my $s (@already_out) {
      print $s, "\n";
    }
    print "\n";
  }
  # Print out fail list
  if ($#failure > 0) {
    print "COULD NOT REQUEST:\n";
    foreach my $s (@failure) {
      print $s, "\n";
    }
    print "\n";
  }
}

sub get_current_ids($)
{
  my $agent = shift;
  my %current_ids = ();
  my $string = $agent->content;
  while ($string =~ m/TITLE\^([0-9]+)\^/g) {
    $current_ids{$1} = 'requested';
  }
  while ($string =~ m/RENEW\^([0-9]+)\^/g) {
    $current_ids{$1} = 'checked out';
  }
  return \%current_ids;
}

sub print_new_books($$)
{
  my $agent = shift;
  my $category = shift;
  $agent->env_proxy();
  $agent->get('http://oldcatalogue.torontopubliclibrary.ca');
  $agent->follow_link(text_regex => qr/Our Newest Titles/);
  $agent->follow_link(text_regex => qr/$category/i);

  my $continue = 1;
  while ($continue) {
    print_titles_on_page($agent);
    if ($agent->form_with_fields('SCROLL^F')) {
      $agent->click('SCROLL^F');
    } else {
      $continue = 0;
    }
  }
}

# Print out all the entries on this page
sub print_titles_on_page($)
{
  my $agent = shift;
  my @titles = $agent->look_down(sub {
                                $_[0]->tag() eq 'strong' and
                                $_[0]->parent->attr('class') and
                                $_[0]->parent->attr('class') eq 'itemlisting'; });
  foreach my $title (@titles) {
    my $hold = $title->parent->parent->parent->parent->look_down(sub {
                                                          $_[0]->attr('alt') and
                                                          $_[0]->attr('alt') eq 'Place Hold'; });
    my $id = "";
    my $call_no = "";
    if ($hold && $hold->parent && $hold->parent->attr('href') =~ /item_id=([^&]+)&.*?callnum=([^ "&]+)/) {
      $id = $1;
      $call_no = $2;
    }
    print $call_no . "\t" . $title->as_text . "\t" . $id . "\n";
  }
}

sub clean_string($)
{
    my $string = shift;
    $string =~ s#^.*?(<form id="renewitems" [^>]+>)#<html><body>\1#s;
    $string =~ s#</form>.*#</form></html>#s;
    $string =~ s#<table border="0" bordercolor="red".*(<table border="0" bordercolor="blue" cellspacing="0" cellpadding="0">)#\1#s;
    $string =~ s#</table>.*</form>#</table></form>#s;
# Clean up for parsing
    $string =~ s#<!-- Print the date due -->##g;
    $string =~ s#<br> <!-- Displays Date -->##g;
    return $string;
}

sub log_in_to_library($$$) {
    my $agent = shift;
    my $username = shift;
    my $password = shift;
    $agent->get('http://beta.torontopubliclibrary.ca/youraccount');
    $agent->form_name('form_signin');
    $agent->current_form->value('userId', $username);
    $agent->current_form->value('password', $password);
    $agent->submit();
}

Ah, Emacs!

View or add comments (Disqus), or e-mail me at sacha@sachachua.com

Quantified Awesome: Analyzing my non-fiction reading, and why I don’t mind paying taxes

Posted: - Modified: | quantified

Update Aug 22 2013: See presentation at the end of this post.

imageI built library-book tracking into Quantified Awesome in October 2011, hard-coding the patterns used by the Toronto Public Library system. I regularly hit the 50-book checkout limit and sometimes have to check items out on my husband’s account, so it really helps to have a system that can tell me what’s due and when on all the library cards that we have.

Crunching the numbers

In the 668 days (or 95.5 weeks) between October 1, 2011 and the time I exported my data for number-crunching, we checked out 1,252 items, or an average of 13 items a week. That included 250 movies and 44 other videos (TV series, documentaries, and so on). I’m boggled to find out that I checked out only 8 science fiction books. There were 152 other fiction books, including graphic novels and manga.

… 250 movies borrowed from the library results in saving of $150+ a month assuming we snag DVDs at $15. Not that we would watch 2.6 movies a week if we had to pay for them. In November 2011, I tracked the retail prices and page count of the books I read: $1,075 and 10,671 pages in a month, boggle. I don’t read all of those pages thoroughly, mind you; I tend to skim books looking for just what I need. Still, there’s no denying that the Toronto Public Library saves me a heck of a lot of book and entertainment money.

I figured I’d probably want to take a look at my reading list at some point, so I had programmed the system to record titles and Dewey Decimal System classifications as well. Fiction books and feature movies tend to have generic codes, but nonfiction books show me interesting patterns in my reading habits.

Here are my top categories:

Dewey Decimal Classification Number of items
650 – Management & auxiliary services 328
330 – Economics 82
150 – Psychology 59
740 – Drawing & decorative arts 45
800 – Literature, rhetoric & criticism 38
000 – Computer science, knowledge & systems 33
640 – Home economics & family living 30
610 – Medical sciences; Medicine 22
300 – Social sciences, sociology & anthropology 18
340 – Law 14

Here are my top sub categories:

Dewey Decimal Classification Number of items
658 – General management 213
650 – Management & auxiliary services 92
332 – Financial economics 45
808 – Rhetoric & collections of literature 37
741 – Drawing & drawings 28
158 – Applied psychology 24
153 – Mental processes and intelligence 22
641 – Food & drink 19
005 – Computer programming, programs & data 18
613 – Personal health & safety 15

Here they are, broken down by Dewey decimal group:

Dewey Decimal Classification Number of items
650 – Management & auxiliary services 328
658 – General management 213
650 – Management & auxiliary services 92
657 – Accounting 9
659 – Advertising & public relations 5
651 – Office services 4
652 – Processes of written communication 3
653 – Shorthand 2
330 – Economics 82
332 – Financial economics 45
338 – Production 15
331 – Labor economics 10
330 – Economics 9
339 – Macroeconomics & related topics 2
333 – Land economics 1
150 – Psychology 59
158 – Applied psychology 24
153 – Mental processes and intelligence 22
155 – Differential and developmental psychology 11
152 – Perception, movement, emotions, and drives 1
150 – Psychology 1
740 – Drawing & decorative arts 45
741 – Drawing & drawings 28
743 – Drawing & drawings by subject 10
745 – Decorative arts 7
800 – Literature, rhetoric & criticism 38
808 – Rhetoric & collections of literature 37
809 – Literary history & criticism 1
000 – Computer science, knowledge & systems 33
005 – Computer programming, programs & data 18
006 – Special computer methods 12
001 – Knowledge 2
003 – Systems 1
640 – Home economics & family living 30
641 – Food & drink 19
646 – Sewing, clothing, personal living 5
640 – Home economics & family living 3
647 – Management of public households 1
644 – Household utilities 1
643 – Housing & household equipment 1
610 – Medical sciences; Medicine 22
613 – Personal health & safety 15
616 – Diseases 4
612 – Human physiology 3
300 – Social sciences, sociology & anthropology 18
306 – Culture & institutions 7
303 – Social processes 4
305 – Social groups 3
302 – Social interaction 3
304 – Factors affecting social behavior 1
340 – Law 14
346 – Private law 8
343 – Military, tax, trade, industrial law 4
349 – Law of specific jurisdictions & areas 2
490 – Other languages 13
495 – Languages of East & Southeast Asia 13
690 – Buildings 12
690 – Buildings 8
695 – Roof covering 2
696 – Utilities 1
692 – Auxiliary construction practices 1
170 – Ethics (Moral philosophy) 9
170 – Ethics (Moral philosophy) 5
174 – Occupational ethics 2
171 – Ethical systems 2
370 – Education 6
371 – School management; special education; alternative education 5
372 – Elementary education 1
720 – Architecture 5
729 – Design & decoration 2
728 – Residential & related buildings 2
720 – Architecture 1
810 – American literature in English 5
813 – Fiction 2
818 – Miscellaneous writings 1
819 – Puzzle activities 1
814 – Essays 1
680 – Manufacture for specific uses 4
684 – Furnishings & home workshops 2
688 – Other final products & packaging 1
686 – Printing & related activities 1
770 – Photography & photographs 4
775 – Digital photography 2
779 – Photographs 1
778 – Fields & kinds of photography 1
470 – Italic languages; Latin 4
478 – Classical Latin usage 4
910 – Geography & travel 4
915 – Asia 4
420 – English & Old English 3
422 – English etymology 2
428 – Standard English usage 1
510 – Mathematics 3
519 – Probabilities & applied mathematics 3
400 – Language 3
401 – Philosophy & theory 3
070 – News media, journalism & publishing 3
070 – News media, journalism & publishing 3
620 – Engineering & Applied operations 3
629 – Other branches of engineering 2
620 – Engineering & Applied operations 1
390 – Customs, etiquette, folklore 2
398 – Folklore 2
160 – Logic 2
160 – Logic 2
360 – Social services; association 2
362 – Social welfare problems & services 2
020 – Library & information sciences 2
025 – Library operations 1
021 – Library relationships 1
750 – Painting & paintings 2
759 – Geographical, historical, areas, persons treatment 1
751 – Techniques, equipment, forms 1
380 – Commerce, communications, transport 2
381 – Internal commerce (Domestic trade) 2
970 – General history of North America 2
977 – General history of North America; North central United States 1
974 – General history of North America; Northeastern United States 1
700 – Arts 2
709 – Historical, areas, persons treatment 1
700 – Arts 1
190 – Modern Western philosophy 1
191 – Modern Western philosophy of the United States and Canada 1
030 – Encyclopedias & books of facts 1
031 – Encyclopedias in American English 1
780 – Music 1
786 – Keyboard & other instruments 1
290 – Other & comparative religions 1
296 – Judaism 1
210 – Natural theology 1
210 – Natural theology 1
520 – Astronomy & allied sciences 1
523 – Specific celestial bodies & phenomena 1
950 – General history of Asia; Far East 1
952 – General history of Asia; Japan 1
710 – Civic & landscape art 1
712 – Landscape architecture 1
630 – Agriculture 1
635 – Garden crops (Horticulture) 1
500 – Sciences 1
501 – Philosophy & theory 1
Grand Total 776

I read a lot of management, personal finance, and psychology books. I enjoy reading them. I read them at breakfast, over lunch, before bed, on weekend afternoons. I’m not surprised by the proportions, although I’m a little surprised by the number – have I really checked out an average of eight nonfiction books a week? Gotten through more than 300 management-related books? Neat.

Time

The time data in my current system goes back to November 29, 2011. Excluding the nonfiction books that were returned before then (although still including the books I currently have checked out that I haven’t read yet), there are 727 nonfiction books checked out. Let’s assume I’ve read or skimmed through 80% of those (I’m probably closer to 90%) – that’s ~580 books. I’ve tracked 123.3 hours as “Discretionary – Productive – Nonfiction”. This undercounts the number of hours because I tend to read things over meals and during subway commutes, so let’s double that time to be in the right ballpark for multitasking. That’s a little less than half an hour per book… which is actually quite reasonable, considering I skim through most books in 10-15 minutes each and spend maybe two hours reading selected books in depth (the ones that I take notes on, for example).

This is what my nonfiction reading habit looks like, with the dark boxes indicating when I read more. (This doesn’t take into account reading while doing other things.)

image

That’s interesting… I read a lot more frequently when I was starting up my business in January/February 2012 (although I wonder what happened in April!). I read more sporadically now. I think it’s because I’m re-figuring-out my strategies for taking notes and applying ideas to my life.

How do I pick books to read?

The library releases lists of new books on the 15th or 16th of every month. I’ve written a small script that extracts the titles, authors, and IDs of the book into a text file that I can review. I delete the lines that I’m not interested in, and my script then requests the books that remain on the list. I monitor the new releases because I don’t want to wait for the usual press

When I ‘m learning about a topic, I tend to check out six or more books related to it. A wide variety of books lets me see different viewpoints, and I can focus on books of better quality.

I occasionally look at Amazon’s recommendations for other ideas, although the books are often not yet available at the library.

Sketchnoting a new release can have high impact, which is another reason why I monitor the new releases. I sometimes reach out to publishers for review copies as well.

The library doesn’t carry everything. I usually add other interesting releases to my Amazon wishlist. I rarely buy books, though, because there’s just such an interesting backlog that I haven’t yet gotten through. I buy books if there are clever illustrations that I’d like to use for ongoing inspiration, or if I want to give the book to a friend, or if it’s an older book that someone has recommended to me and the library doesn’t stock it. Now that I have a business, I usually file those books as business expenses.

So much for quantity. If I’m reading all that, what am I doing with it?

I use books to:

Many of the ideas I pick up from books resurface in my blog posts and experiments. Books help me recognize what’s going on in real life, because the authors have already come up with words for them. I also like applying the advice from books – much to learn.

I frequently recommend books to other people. Visual book reviews make that easier. I try to slow down and recognize books that I’ll probably recommend to others so that I can make visual book reviews of them while I have the book. Sometimes I’ll take quick text notes for myself and then use that for reference. If I find myself recommending the book frequently, then I’ll check it out again and make better notes.

I don’t review my book notes much, relying instead on situations to trigger my memories. When I come across something that’s related to a book I’ve read, or I talk to someone who could benefit from a book recommendation, I dig through my visual and text-based notes.

Next steps

“Better” isn’t about reading more books – it’s about being able to apply, organize, and share what I’m learning from those books. I want to learn more about doing good research: identifying a topic to explore, synthesizing insights from multiple sources, and adding personal experiences or ideas. The process might look like this:

Visual book reviews are another way for me to grow. In addition to making visual book reviews of interesting new releases, I’d like to revisit the books that were a big influence on me in order to make visual reviews of them too.

Yay for Quantified Self and tracking. Onward!

(Also, if you’re curious about tracking your own library use: I can probably extend Quantified Awesome to support other libraries with online interfaces, but you’re going to need to walk me through how your library works.)


Update from August 22, 2013: Here’s a presentation I’m putting together for Quantified Self Toronto.

View or add comments (Disqus), or e-mail me at sacha@sachachua.com

Sketchnotes: Sal Sloan of Fetching! at the Toronto Public Library: Small Business Networking Event

Posted: - Modified: | business, connecting, delegation, entrepreneurship, sketchnotes

fetching-small

(Click on the image to see the larger version)

The Toronto Public Library hosts monthly networking events for people who are interested in starting a small business. Most people have not yet started a business. It’s a good opportunity to ask questions and learn from someone who has figured some things out.

Sal Sloan came up with the business idea for Fetching! when she got a dog. She had signed up for a fitness bootcamp, and the combination of exercising herself and walking her dog wore her out. Why not combine the two activities – help people exercise with their dogs? With a $10,000 loan from her parents, Sal started Fetching! by focusing on exercise for people and obedience training for dogs. With early success, Sal broadened her scope to focusing on helping people have active fun with their pets. She has been doing the business for two and a half years, and continues to work part-time on another job. This helps her grow the business organically by avoiding financial pressures.

One of the lessons I took away from the conversation was the power of delegating work to other people. Sal knew that other personal trainers could run sessions much better than she could, so she hired good people whom she could trust to represent her company. She’s looking for someone who can help her with the business side so that she can grow more, too. After I bank some money from this consulting engagement, I might start my delegation experiments again.

The session was an interesting contrast to last month’s meetup with Kristina Chau of notyouraverageparty, who had been in business for three years and who was struggling to scale up beyond herself. Sal has clearly put work into figuring out how to scale up, and it’s great to see how it paid off.

View or add comments (Disqus), or e-mail me at sacha@sachachua.com

New library reminder script

| geek

Gabriel Mansour reminded me to update my old library script. =) Here’s the new one: library-reminder.pl. Serious geekery may be needed to make use of this script.

View or add comments (Disqus), or e-mail me at sacha@sachachua.com

Virtual assistance process: Manage Toronto Public Library books

Posted: - Modified: | process
  1. Visit http://www.torontopubliclibrary.ca and click on Your Account. Sign in with the provided library card number and PIN.
  2. Click on Your Account, and then click on Checkouts. You will see a list of checked-out books sorted by due date.
  3. Click on the checkbox beside all items due by the following Saturday, and then click on Renew Selected Items.
  4. You should see a list of items that were renewed and items that failed to be renewed (maximum number of renewals, other users have placed holds). Copy the titles of the items that were not renewed into an e-mail under the heading “TO RETURN“, one title and date per line. Keep this list sorted by due date.
  5. Click on Holds. If there are any items under the heading Ready for Pickup, copy the titles and expiry dates to the e-mail under the heading “TO PICK UP”, one title and date per line. Keep this list sorted by due date.
  6. Click on Sign Out.
  7. Repeat steps for any other library cards indicated, summarizing the books to return and pick up under the same headings you’ve created.
  8. If there are books to return, log on to Toodledo.com. Click on Add Task in the upper left corner. Set the subject to “Return library books”, the due date to today, the context to Errands, and the note to include all the text (return and pick-up) from the e-mail. Add the task.
  9. Send me the e-mail with the title Toronto Public Library Report.
View or add comments (Disqus), or e-mail me at sacha@sachachua.com