# Category Archives: node

## De-dupe and link: Using the Flickr API to neaten up my archive and link sketches to blog posts

I've been thinking about how to manage the relationships between my blog posts and my Flickr sketches. Here's the flow of information:

2015.01.06 Figuring out information flow – index card

I scan my sketches or draw them on the computer, and then I upload these sketches to Flickr using photoSync, which synchronizes folders with albums. I include these sketches in my outlines and blog posts, and I update my index of blog posts every month. I recently added a tweak to make it possible for people to go from a blog post to its index entry, so it should be easier to see a post in context. I've been thinking about keeping an additional info index to manage blog posts and sketches, including unpublished ones. We'll see how well that works. Lastly, I want to link my Flickr posts to my blog posts so that people can see the context of the sketch.

My higher goal is to be able to easily see the open ideas that I haven't summarized or linked to yet. There's no shortage of new ideas, but it might be interesting to revisit old ones that had a chance to simmer a bit. I wrote a little about this in Learning from artists: Making studies of ideas. Let me flesh out what I want this archive to be like.

When I pull on an idea, I'd like to be able to see other open topics attached to it. I also want to be able to see open topics that might jog my memory.

How about the technical details? How can I organize my data so that I can get what I want from it?

2015.01.05 Figuring out the technical details of this idea or visual archive I want – index card

Because blog posts link to sketches and other blog posts, I can model this as a directed graph. When I initially drew this, I thought I might be able to get away with an acyclic graph (no loops). However, since I habitually link to future posts (the time traveller's problem!), I can't make that simplifying assumption. In addition, a single item might be linked from multiple things, so it's not a simple tree (and therefore I can't use an outline). I'll probably start by extracting all the link information from my blog posts and then figuring out some kind of Org Mode-based way to update the graph.

To get one step closer to being able to see open thoughts and relationships, I decided that my sketches on Flickr:

• should not have duplicates despite my past mess-ups, so that:
• I can have an accurate count
• it's easier for me to categorize
• people get less confused
• should have hi-res versions if possible, despite the IFTTT recipe I tried that imported blog posts but unfortunately picked up the low-res thumbnails instead of the hi-res links
• should link to the blog posts they're mentioned in, so that:
• people can read more details if they come across a sketch in a search
• I can keep track of which sketches haven't been blogged yet

I couldn't escape doing a bit of manual cleaning up, but I knew I could automate most of the fiddly bits. I installed node-flickrapi and cheerio (for HTML parsing), and started playing.

### Removing duplicates

Most of the duplicates had resulted from the Great Renaming, when I added tags in the form of #tag1 #tag2 etc. to selected filenames. It turns out that adding these tags en-masse using Emacs' writable Dired mode broke photoSync's ability to recognize the renamed files. As a result, I had files like this:

• 2013-05-17 How I set up Autodesk Sketchbook Pro for sketchnoting.png
• 2013-05-17 How I set up Autodesk Sketchbook Pro for sketchnoting #tech #autodesk-sketchbook-pro #drawing.png

This is neatly resolved by the following Javascript:

exports.trimTitle = function(str) {
return str.replace(/ --.*$/g, '').replace(/#[^ ]+/g, '').replace(/[- _]/g, ''); };  and a comparison function that compared the titles and IDs of two photos: exports.keepNewPhoto = function(oldPhoto, newPhoto) { if (newPhoto.title.length > oldPhoto.title.length) return true; if (newPhoto.title.length < oldPhoto.title.length) return false; if (newPhoto.id < oldPhoto.id) return true; return false; };  So then this code can process the photos: exports.processPhoto = function(p, flickr) { var trimmed = exports.trimTitle(p.title); if (trimmed && hash[trimmed] && p.id != hash[trimmed].id) { // We keep the one with the longer title or the newer date if (exports.keepNewPhoto(hash[trimmed], p)) { exports.possiblyDeletePhoto(hash[trimmed], flickr); hash[trimmed] = p; } else if (p.id != hash[trimmed].id) { exports.possiblyDeletePhoto(p, flickr); } } else { hash[trimmed] = p; } };  You can see the code on Gist: duplicate_checker.js. ### High-resolution versions I couldn't easily automate this, but fortunately, the IFTTT script had only imported twenty images or so, clearly marked by a description that said: "via sacha chua :: living an awesome life…". I searched for each image, deleting the low-res entry if a high-resolution image was already in the system and replacing the low-res entry if that was the only one there. ### Linking to blog posts This was the trickiest part, but also the most fun. I took advantage of the fact that WordPress transforms uploaded filenames in a mostly consistent way. I'd previously added a bulk view that displayed any number of blog posts with very little additional markup, and I modified the relevant code in my theme to make parsing easier. See this on Gist: /** * Adds "Blogged" links to Flickr for images that don't yet have "Blogged" in their description. * Command-line argument: URL to retrieve and parse */ var secret = require('./secret'); var flickrOptions = secret.flickrOptions; var Flickr = require("flickrapi"); var fs = require('fs'); var request = require('request'); var cheerio = require('cheerio'); var imageData = {}; var$;

function setDescriptionsFromURL(url) {
request(url, function(error, response, body) {
// Parse the images
$= cheerio.load(body);$('article').each(function() {
var prettyLink = $(this).find("h2 a").attr("href"); if (!prettyLink.match(/weekly/i) && !prettyLink.match(/monthly/i)) { collectLinks($(this), prettyLink, imageData);
}
});
updateFlickrPhotos();
});
}

function updateFlickrPhotos() {
Flickr.authenticate(flickrOptions, function(error, flickr) {
flickr.photos.search(
{user_id: flickrOptions.user_id,
per_page: 500,
extras: 'description',
text: ' -blogged'}, function(err, result) {
processPage(result, flickr);
for (var i = 2 ; i < result.photos.pages; i++) {
flickr.photos.search(
{user_id: flickrOptions.user_id, per_page: 500, page: i,
extras: 'description', text: ' -blogged'},
function(err, result) {
processPage(err, result, flickr);
});
}
});
});
}

var results = [];
article.find(".body a").each(function() {
var link = $(this); if (link.attr('href')) { if (link.attr('href').match(/sachachua/) || !link.attr('href').match(/^http/)) { imageData[exports.trimTitle(link.attr('href'))] = prettyLink; } else if (link.attr('href').match(/flickr.com/)) { imageData[exports.trimTitle(link.text())] = prettyLink; } } }); return results; } exports.trimTitle = function(str) { return str.replace(/^.*\//, '').replace(/^wpid-/g, '').replace(/[^A-Za-z0-9]/g, '').replace(/png$/, '').replace(/[0-9]$/, ''); }; function processPage(result, flickr) { if (!result) return; for (var i = 0; i < result.photos.photo.length; i++) { var p = result.photos.photo[i]; var trimmed = exports.trimTitle(p.title); var noTags = trimmed.replace(/#.*/g, ''); var withTags = trimmed.replace(/#/g, ''); var found = imageData[noTags] || imageData[withTags]; if (found) { var description = p.description._content; if (description.match(found)) continue; if (description) { description += " - "; } description += '<a href="' + found + '">Blogged</a>'; console.log("Updating " + p.title + " with " + description); flickr.photos.setMeta( {photo_id: p.id, description: description}, function(err, res) { if (err) { console.log(err, res); } } ); } } } setDescriptionsFromURL(process.argv[2]);  And now sketches like 2013-11-11 How to think about a book while reading it are now properly linked to their blog posts. Yay! Again, this script won't get everything, but it gets a decent number automatically sorted out. Next steps: • Run the image extraction and set description scripts monthly as part of my indexing process • Check my list of blogged images to see if they're matched up with Flickr sketches, so that I can identify images mysteriously missing from my sketchbook archive or not correctly linked Yay code! ## Windows: Pipe output to your clipboard, or how I’ve been using NodeJS and Org Mode together It's not easy being on Windows instead of one of the more scriptable operating systems out there, but I stay on it because I like the drawing programs. Cygwin and Vagrant fill enough gaps to keep me mostly sane. (Although maybe I should work up the courage to dual-boot Windows 8.1 and a Linux distribution, and then get my ScanSnap working.) Anyway, I'm making do. Thanks to Node and the abundance of libraries available through NPM, Javascript is shaping up to be a surprisingly useful scripting language. After I used the Flickr API library for Javascript to cross-reference my Flickr archive with my blog posts, I looked around for other things I could do with it. photoSync occasionally didn't upload new pictures I added to its folders (or at least, not as quickly as I wanted). I wanted to replace photoSync with my own script that would: • upload the picture only if it doesn't already exist, • add tags based on the filename, • add the photo to my Sketchbook photoset, • move the photo to the "To blog" folder, and • make it easy for me to refer to the Flickr image in my blog post or index. The flickr-with-uploads library made it easy to upload images and retrieve information, although the format was slightly different from the Flickr API library I used previously. (In retrospect, I should've checked the Flickr API documentation first – there's an example upload request right on the main page. Oh well! Maybe I'll change it if I feel like rewriting it.) I searched my existing photos to see if a photo with that title already existed. If it did, I displayed an Org-style list item with a link. If it didn't exist, I uploaded it, set the tags, added the item to the photo set, and moved it to the folder. Then I displayed an Org-style link, but using a plus character instead of a minus character, taking advantage of the fact that both + and – can be used for lists in Org. While using console.log(...) to display these links in the terminal allowed me to mark and copy the link, I wanted to go one step further. Could I send the links directly to Emacs? I looked into getting org-protocol to work, but I was having problems figuring this out. (I solved those problems; details later in this post.) What were some other ways I could get the information into Emacs aside from copying and pasting from the terminal window? Maybe I could put text directly into the clipboard. The node-clipboard package didn't build for me and I couldn't get node-copy-paste to work either,about the node-copy-paste README told me about the existence of the clip command-line utility, which worked for me. On Windows, clip allows you to pipe the output of commands into your clipboard. (There are similar programs for Linux or Mac OS X.) In Node, you can start a child process and communicate with it through pipes. I got a little lost trying to figure out how to turn a string into a streamable object that I could set as the new standard input for the clip process I was going to spawn, but the solution turned out to be much simpler than that. Just write(...) to the appropriate stream, and call end() when you're done. Here's the relevant bit of code that takes my result array and puts it into my clipboard: var child = cp.spawn('clip'); child.stdin.write(result.join("\n")); child.stdin.end(); Of course, to get to that point, I had to revise my script. Instead of letting all the callbacks finish whenever they wanted, I needed to be able to run some code after everything was done. I was a little familiar with the async library, so I used that. I copied the output to the clipboard instead of displaying it so that I could call it easily using ! (dired-do-shell-command) and get the output in my clipboard for easy yanking elsewhere, although I could probably change my batch file to pipe the result to clip and just separate the stderr stuff. Hmm. Anyway, here it is! See this on Github /** * Upload the file to my Flickr sketchbook and then move it to * Dropbox/Inbox/To blog. Save the Org Mode links in the clipboard. - * means the photo already existed, + means it was uploaded. */ var async = require('async'); var cp = require('child_process'); var fs = require('fs'); var glob = require('glob'); var path = require('path'); var flickr = require('flickr-with-uploads'); var secret = require("./secret"); var SKETCHBOOK_PHOTOSET_ID = '72157641017632565'; var BLOG_INBOX_DIRECTORY = 'c:\\sacha\\dropbox\\inbox\\to blog\\'; var api = flickr(secret.flickrOptions.api_key, secret.flickrOptions.secret, secret.flickrOptions.access_token, secret.flickrOptions.access_token_secret); var result = []; function getTags(filename) { var tags = []; var match; var re = new RegExp('#([^ ]+)', 'g'); while ((match = re.exec(filename)) !== null) { tags.push(match[1]); } return tags.join(' '); } // assert(getTags("foo #bar #baz qux") == "bar baz"); function checkIfPhotoExists(filename, doesNotExist, existsFunction, done) { var base = path.basename(filename).replace(/.png$/, '');
api({method: 'flickr.photos.search',
user_id: secret.flickrOptions.user_id,
text: base},
function(err, response) {
var found = undefined;
if (response && response.photos[0].photo) {
for (var i = 0; i < response.photos[0].photo.length; i++) {
if (response.photos[0].photo && response.photos[0].photo[i]['$'].title == base) { found = i; break; } } } if (found !== undefined) { existsFunction(response.photos[0].photo[found], done); } else { doesNotExist(filename, done); } }); } function formatExistingPhotoAsOrg(photo, done) { var title = photo['$'].title;
var url = 'https://www.flickr.com/photos/'
+ photo['$'].owner + '/' + photo['$'].id;
result.push('- [[' + url + '][' + title + ']]');
done();
}

function formatAsOrg(response) {
var title = response.photo[0].title[0];
var url = response.photo[0].urls[0].url[0]['_'];
result.push('+ [[' + url + '][' + title + ']]');
}

api({
title: path.basename(filename.replace(/.png$/, '')), is_public: 1, hidden: 1, safety_level: 1, photo: fs.createReadStream(filename), tags: getTags(filename.replace(/.png$/, ''))
}, function(err, response) {
if (err) {
console.log('Could not upload photo: ', err);
done();
} else {
var newPhoto = response.photoid[0];
async.parallel(
[
function(done) {
api({method: 'flickr.photos.getInfo',
photo_id: newPhoto}, function(err, response) {
if (response) { formatAsOrg(response); }
done();
});
},
function(done) {
photoset_id: SKETCHBOOK_PHOTOSET_ID,
photo_id: newPhoto}, function(err, response) {
if (!err) {
} else {
console.log('Could not add ' + filename + ' to Sketchbook');
done();
}
});
}],
function() {
done();
});
}
});
}

fs.rename(filename, BLOG_INBOX_DIRECTORY + path.basename(filename),
function(err) {
if (err) { console.log(err); }
done();
});
}

var arguments = process.argv.slice(2);
async.each(arguments, function(item, done) {
if (item.match('\\*')) {
glob.glob(item, function(err, files) {
if (!files) return;
async.each(files, function(file, done) {
}, function() {
done();
});
});
} else {
}
}, function(err) {
console.log(result.join("\n"));
var child = cp.spawn('clip');
child.stdin.write(result.join("\n"));
child.stdin.end();
});


Wheeee! Hooray for automation. I made a Windows batch script like so:

up.bat

node g:\code\node\flickr-upload.js %*


and away I went. Not only did I have a handy way to process images from the command line, I could also mark the files in Emacs Dired with m, then type ! to execute my up command on the selected images. Mwahaha!

Anyway, I thought I'd write it up in case other people were curious about using Node to code little utilities, filling the clipboard in Windows, or getting data back into Emacs (sometimes the clipboard is enough).

Back to org-protocol, since I was curious about it. With (require 'org-protocol) (server-start), emacsclient org-protocol://store-link:/foo/bar worked when I entered it at the command prompt. I was having a hard time getting it to work under Node, but eventually I figured out that:

• I needed to pass -n as one of the arguments to emacsclient so that it would return right away.
• The : after store-link is important! I was passing org-protocol://store-link/foo/bar and wondering why it opened up a file called bar. org-protocol://store-link:/foo/bar was what I needed.

I only just figured out that last bit while writing this post. Here's a small demonstration program:

var cp = require('child_process');
var child = cp.execFile('emacsclient', ['-n', 'org-protocol://store-link:/foo/bar']);


Yay!

2015-01-13 Using Node as a scripting tool – index card #javascript #nodejs #coding #scripting